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Abstract 



We describe a distributed randomized algorithm computing approximate distances and routes 
that approximate shortest paths. Let n denote the number of nodes in the graph, and let HD denote 
the hop diameter of the graph, i.e., the diameter of the graph when all edges are considered to have 
unit weight. Given < e < 1/2, our algorithm runs in ©(n^/^+'^+HD) communication roimds using 
messages of ©(logn) bits and guarantees a stretch of ©(e"^ log£^^) with high probability. This 
is the first distributed algorithm approximating weighted shortest paths that uses small messages 
and runs in 6(n) time (in graphs where HD S o{n)). The time complexity nearly matches the 
lower bounds of fl{y/n + HD) in the small-messages model that hold for stateless routing (where 
routing decisions do not depend on the traversed path) as well as approximation of the weigthed 
diameter. Our scheme replaces the original identifiers of the nodes by labels of size 0(log log n). 
We show that no algorithm that keeps the original identifiers and runs for o(n) rounds can achieve 
a polylogarithmic approximation ratio. 

Variations of our techniques yield a number of fast distributed approximation algorithms solving 
related problems using small messages. Specifically, we present algorithms that run in 0(n^/^+^ + 
HD) rounds for a given < £ < 1/2, and solve, with high probability, the following problems: 

• C'(£~^)-approximation for the Generalized Steiner Forest (the running time in this case has 
an additive 0(t^+^^) term, where t is the number of terminals); 

• C'(£~^)-approximation of weighted distances, using node labels of size 0{e~^ logn) and O(n^) 
bits of memory per node; 

• C(£~^)-approximation of the weighted diameter; 

• 0(e~^)-approximate shortest paths using the labels 1, . . . , n. 
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1 Introduction 



Constructing routing tables is a central task in network operation, the Internet being a prime example. 
Besides being an end goal on its own (facilitating the transmission of information from a sender to 
a receiver), efficient routing and distance approximation are critical ingredients in a myriad of other 
distributed applications. 

At the heart of any routing protocol lies the computation of short paths in weighted graphs, 
where edge weights may reflect properties such as link cost, delay, bandwidth, reliability etc. In the 
distributed setting, an additional challenge is that the graph whose shortest paths are to be computed 
serves also as the platform carrying communication between the computing nodes. The result of this 
double role is an intriguing interplay between two metrics: the given shortest paths metric and the 
"natural" communication metric of the distributed system. The first metric is used for the definition 
of shortest paths, where an edge weight represents its contribution to path lengths; the other metric 
is implicit, controlling the time complexity of the distributed computation: each edge is tagged by 
the time it takes a message to cross it. If these two metrics happen to be identical, then computing 
weighted shortest paths to a single destination is trivial (for the all-pairs problem, see below). For 
the general case, the standard normalization is that messages cross each link in unit time, regardless 
of the link weight; this assumption is motivated by network synchronization. On the other hand, the 
length of the message must be taken into account as well. More precisely, in the commonly-accepted 
CONGEST model of network algorithms [21], it is assumed that all link latencies are one unit and 
messages have fixed size, typically 0(logn) bits, where n denotes the number of nodes. 

The classical algorithm for computing shortest path distributively is the distributed variant of the 
Bellman- Ford algorithm. This algorithm is used in many networks, ranging from local to wide area 
networks. The Bellman-Ford algorithm enjoys many properties that make it an excellent distributed 
algorithm (locality, simplicity, self-stabilization). However, in weighted graphs, its time complexity, 
i.e., the number of parallel iterations, may be as high as Q{n) for a single destination. This is in sharp 
contrast with the 0(HD) time needed to compute unweighted shortest paths to a single destination, 
where HD denotes the unweighted "hop-diameter" of the network. The difference between n and HD 
can be huge; suffices to say that the hop-diameter of the Internet is estimated to be smaller than 
50. Intuitively, the problem originates in the fact that the Bellman-Ford algorithm explores paths in 
a hop-by-hop fashion, and the aforementioned superposition of metrics may result in a path that is 
weight-wise short, but consists of Q{n) edges. If shortest paths have at most SPD G N edges, then it 
suffices to run the Bellman-Ford algorithm for SPD communication rounds. Indeed, the running time 
of a few distributed algorithms is stated as a function of this or a similar parameter for exactly this 
reason (e.g., [6, 15, 16]). 

To the best of our knowledge, no distributed algorithm for computing (approximate) weighted 
shortest paths in o(SPD) time in the CONGEST model was known to date. In this paper we present 
a distributed algorithm that computes approximate all-pairs shortest paths and distances using small 
messages, in time that nearly matches the lower bound of Q(^yn + HD). 
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1.1 Detailed Contributions 

Our main technical contribution, presented in Section 4, is an algorithm that, using messages of 
size 0{logn), constructs, for any < e < 1/2, in 0{n^^'^'^^ + HD) rounds node labels^ of size 
0{log£~^ logn) and routing tables of size 0(n^/^"^^) facilitating routing and distance estimation with 
stretch C(e~^ loge~^). We show that assigning new labels to the nodes is unavoidable by proving that 
any (randomized) algorithm achieving polylogarithmic (expected) stretch without relabeling must run 
for J7(n) rounds. The running time of our algorithm is close to optimal, since known results [7, 8, 23] 
imply that computing such an approximation in the CONGEST model must take 0,{y/n + HD) 
rounds. 

Our algorithm comprises two sub-algorithms that we believe to be of interest in their own right. 
One is used for short-range routing (roughly, for the closest ^/n nodes), the other for longer distances. 
The short-range algorithm constructs a hierarchy in the spirit of Thorup-Zwick distance oracles [30]: 
A recursive structure of uniformly sampled "landmarks" is used to iteratively reduce the number of 
routing destinations (and routes) that need to be learned, and repeated use of the triangle inequality 
shows that the stretch is linear in the number of recursion stages. While this idea is not new, our main 
challenge is to implement the algorithm using small messages; to this end, we introduce a bootstrapping 
technique that, combined with a restricted variant of Bellmann-Ford (that bounds the hop range and 
the number of tracked sources), allows us to construct low-stretch routing tables for nearby nodes. 

This approach runs out of steam (i.e., exceeds our target complexity) beyond the closest 0{^/n) 
nodes, so at that point we switch to the "long-distance" scheme. The basic idea in this scheme is to 
pick roughly ^/n random nodes we call the skeleton nodes, and to compute all-to-all routing tables 
for them. This is achieved by simulating the spanner construction algorithm by Baswana and Sen [3]. 
Again, the crux of the matter is an efficient implementation of this approach using small messages. 
To this end, we first construct a spanner of a graph defined by the skeleton nodes and shortest paths 
between them. Due to the small number of skeleton nodes and the reduced number of edges (thanks 
to the spanner construction), we can afford to broadcast the entire skeleton-spanner graph, thereby 
making skeleton routing information common knowledge. In addition, we can mark the corresponding 
paths in the original graph quickly. Here too, our main low-level tool is the restricted Bellmann-Ford 
algorithm that bounds both the range and the load. 

Using variants of our techniques, in Section 5 we derive efficient solutions to several related problems 
(all statements hold with high probability). 

• For the Generalized Steiner Forest (gsf) problem we obtain, for any < e < 1/2, an 0{£~^)- 
approximation within (D{{^/n + t)^'^'' + HD) rounds, where t is the number of terminals. This 
should be contrasted with the best known distributed approximation algorithm for GSF [15], 
which provides 0(logn)-approximation in time 0(SPD • k), where SPD is the "shortest paths 
diameter," namely the maximal number of hops in any shortest path, and k is the number of 
terminal components in the GSF instance. 

• For any A; G N, we obtain an 0{{y/n)^^^^'' + HD)-time algorithm that constructs labels of size 

^We remark that our use of the term differs from the common definition in that we distinguish between the auxihary 
routing information stored by the nodes (the tables) and the (preferrably very smaU) labels replacing the original node 
identifiers as routing address. 
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0{klogn) and local tables of size 0{n^^^'^''^), and produces distance estimations with stretch 
0{k'^). Compare with the recent distributed algorithm [6] that attains the same local space 
consumption at running time ©(SPD • n^/^^fc)) stretch 4/c - 1. 

• Given any < e < 1/2, we can compute an 0(e~^)-approximation of the diameter within 
^^^i/2+e_|_jj-Q-j pounds. We show that the standard construction yielding a lower bound f2(-^/n + 
HD) extends to this problem, implying that also for this special case our solution is close to 
optimal. 

• Employing a different routing mechanism for the short-range scheme, we can assign the fixed 
labels of 1, . . . , n. This comes at the expense of a larger stretch of 0{e^^) within 0(n^/^+^ + HD) 
rounds, for any < e < 1/2. 

1.2 Related Work 

There are many centralized algorithms for constructing routing tables; in these algorithms the goal 
is usually to minimize space without affecting the quality of the routes too badly. We briefly discuss 
them later, since our focus is the distributed model. At this point let us just comment that a naive 
implementation of a centralized algorithm in the CONGEST model requires time in the worst 

case, since the whole network topology has to be collected at a single node just for computation. 

Practical distributed routing table construction algorithms are usually categorized as either "dis- 
tance vector" or "link state" algorithm (see, e.g., [26]). Distance- vector algorithms are variants of 
the Bellman-Ford algorithm [4, 11], whose worst-case time complexity in the CONGEST model is 
O(n^). In link-state algorithms [19, 20], each routing node collects the complete graph topology and 
then solves the single-source shortest path problem locally. This approach has 0(|S|) time complexity. 
While none of these algorithms uses relabeling, it should be noted that the Internet architecture in fact 
employs relabeling (IP addresses, which are used instead of physical addresses, encode some routing 
information) . 

Prom the theoretical perspective, as mentioned above, there has not been much progress in com- 
puting weighted shortest paths beyond the "shortest path diameter" (we denote by SPD) even for 
the single-source case: see, e.g., [6] and references therein. For the unweighted case, an 0(n)-time 
algorithm for exact all-pairs shortest-paths was recently discovered (independently) in [14] and [22]. 
These algorithms do not relabel the nodes. In addition, a randomized (3/2)-approximation of HD is 
given in [22], and a deterministic (1 + e)-approximation is provided by [14]. Combining results, [14] 
and [22] report a randomized (3/2)-approximation of the unweighted diameter in time 0{n^^^). 

In [7], a lower bound of ^{^/n) on the time to construct a shortest-paths tree of weight within a 
poly(n) of the optimum is shown; this immediately implies the same lower bound on routing (more 
precisely, on stateless routing, where routing decisions depend only on the destination and not on the 
traversed path). To the best of our knowledge, the literature does not state any further explicit lower 
bounds on the running time of approximate shortest paths or distance estimation algorithms, but a 
lower bound of V^{^/n) can be easily derived using the technique used in [7] (which in turn is based 
on [23]). In [12] it is shown that in the CONGEST model, approximating the diameter of unweighted 
graphs to within a factor of 3/2 — e requires Q.{^/n) rounds. For the unweighted case, we extend this 
result to arbitrary approximation ratios. 
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In the Generalized Steiner Forest problem (gsf), the input consists of a weighted graph and a set of 
terminal nodes which is partitioned into subsets called terminal components. The task is to find a set of 
edges of minimum weight so that the terminal components are connected. Historically, the important 
special case of a minimum spanning tree (all nodes are terminals, single terminal component) has 
been the target of extensive research in distributed computation. It is known that in the CONGEST 
model, the time complexity of computing (or approximating) an MST is r2(^/n + IID) [7, 8, 23]. This 
bound is essentially matched by an exact deterministic solution [13, 17]. An O (log n)-approximate 
MST is presented in [16], whose running time is ©(SPD), where SPD is the "shortest path diameter" 
mentioned previously. For the special case of Steiner trees (arbitrary terminals, single component), 
[5] presents a 2-approximation algorithm whose time complexity is 0{n) (which can easily be refined 
to ©(SPD)). For the general case, [15] presents an C'(logn)-approximation algorithm whose time 
complexity is 0{k • SPD), where k is the number of terminal components.^ 

We now turn to a very brief overview of centralized algorithms. Thorup and Zwick [29] presented 
an algorithm that achieves, for any k € N, routes of stretch 2/c — 1 using 0(n^/*^) memory. In 
terms of memory consumption, it has been established that this scheme is optimal up to a constant 
factor in worst-case stretch w.r.t. routing [24]. This result has been extended to the average stretch, 
and tightened to be exact up to polylogarithmic factors in memory for the worst-case stretch [1]. 
For distance approximation, the Thorup-Zwick scheme is known to be optimal for k = 1,2,3,5 and 
conjectured to be optimal for all k (see [31] and references). The algorithm requires relabeling with 
labels of size 0(A;logn). It is unclear whether stronger lower bounds apply to name- independent 
routing schemes (which keep the original node identifiers); however, for k = 1, i.e., exact routing, 
trivially 0{nlogn) bits suffice (assuming C'(logn)-bit identifiers), and Abraham et al. [2] prove a 
matching upper bound of 0{^/n) bits for k = 2. 

A closely related concept is that of sparse spanners, introduced by Peleg and Schaffer [25]. A k- 
spanner of a graph is obtained by deleting edges, without increasing the distances by more than factor 
k. Similarly to compact routing tables, it is known that a {2k — l)-spanner must have f2(n^+^/'^) edges 
for some values of k, this is conjectured to hold for all k £N, and a matching upper bound is obtained 
by the Thorup-Zwick construction [30]. If an additive term in the distance approximation is permitted, 
the multiplicative factor can be brought arbitrarily close to 1 [9]. In contrast to routing and distance 
approximation, there are extremely fast distributed algorithms constructing sparse spanners. Our 
long-range construction rests on an elegant algorithm by Baswana and Sen [3] that achieves stretch 
2k - 1 vs. C'(n^+^/'^) expected edges within 0{k) rounds in the CONGEST model. 

2 Model 

In this section we define the model of computation and formalize a few concepts we use. 

^We note that in [15], time-optimality is claimed, up to factor 0{k.). Tiiis comes as a consequence of [16], whicii in 
turn builds on [8]. However, we comment that the latter construction does not scale beyond the familiar lower bound 
of fl{^/n), and a more precise statement would thus be that a minimum spanning tree (and thus also a GSf) requires 
n(min{SPD, \/n}) rounds to be approximated. 
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2.1 The Computational Model 



We follow the CONGEST(i?) model as described in [21]. The distributed system is represented by 
a simple, connected weighted graph G = iV, E, W), where V is the set of nodes, E is the set of edges, 
and : — )• N is the edge weight function.^ As a convention, we use n to denote the number of 
nodes. We assume that all edge weights are bounded by some polynomial in n, and that each node 
V gV has a unique identifier of 0{logn) bits (we use v to denote both the node and its identifier). 

Execution proceeds in global synchronous rounds, where in each round, each node take the fol- 
lowing three steps: (1) Receive messages sent by neighbors at the previous round, (2) perform local 
computation, and (3) send messages to neighbors. Initially, nodes are aware only of their neighbors; 
input values (if any) are assumed to be fed by the environment at time 0. Output values are placed 
in special output-registers. In each round, each edge can carry a message of B bits for some given 
parameter B of the model; we assume that B £ 0(logn) throughout this paper. 

A basic observation in this model is that we may assume, without loss of generality, that we have 
a broadcast facility available, as formalized in the following lemma. 

Lemma 2.1 Suppose each v £ V holds > messages of O {log n) bits each, for a total of M '= 
X^tigy strings. Then all nodes in the graph can receive these M messages within 0{M + HD) 
rounds. 

Proof: Construct a BFS tree rooted at, say, the node r with smallest identifier (0(HD) rounds). All 
nodes send their messages to their parents and forward the messages received by their children to their 
parent as well, until the root holds all messages. Since over no edge more than M messages need to 
be communicated, this requires 0{M + HD) rounds. Finally all messages are broadcast over the tree, 
completing in another 0{M + HD) rounds. | 

In the following, we will use this lemma implicitly whenever stating that some information is "broad- 
cast" or "announced to all nodes." 

2.2 General Concepts 

We use extensively "soft" asymptotic notation that ignores polylogarithmic factors. Formally, we say 
that g{n) G 0{f{n)) if and only if there exists a constant c G such that /(n) < g{n) log^(/(n)) for 
all but finitely many values of n G N. Anagolously, f{n) G Q.{g{n)) iff g{n) G 0{f{n)), G)(/(n)) =^ 
6{f{n))nn{f{n)), g{n) G o(/(n)) iff for each fixed c G it holds that lim„^oo 9{n) log"(/(n))//(n) = 
0, and g{n) G tD(/(n)) iff /(n) G d{g{n)). 

To model probabilistic computation, we assume that each node has access to an infinite string of 
independent unbiased random bits. When we say that a certain event occurs "with high probability" 
(abbreviated "w.h.p."), we mean that the probability of the event not occurring can be set to be less 
than 1 /n'^ for any desired constant c, where the probability is taken over the strings of random bits. 

^We remark that our results can be easily extended to non-negative edge weights by employing appropriate symmetry 
breaking mechanisms. 
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2.3 Some Graph-Theoretic Concepts 



A path p connecting v,u E V is a sequence of nodes {v = vq, ■ ■ ■ ,Vk = u) such that for all < z < A;, 
{vi,Vi+i) is an edge in G. Let paths(t;,n) denote the set of all paths connecting nodes v and u. We 
use the following unweighted concepts. 

• The hop-length of a path p, denoted i{p), is the number of edges in it. 

• The hop distance hd : F x y — t- No is defined as hd(f , n) := mm{i{p) \ p G paths(i;, u)}. 

• The hop diameter of a graph G = (V, E, W) is HD =^ max^^„gy{hd(i;, u)}. 

We use the following weighted concepts. 

• The weight of a path p., denoted W{p), is its total edge weight, i.e., W{p) '= Yli=i ^i'^i-i^'^i)- 

• The weighted distance wd : V x V Mq" is defined by wd(t;, u) '= min{W{p) \ p G paths(u, u)}. 

def 

• The weighted diameter of G is WD = max{wd{v,u) \v,u& V}. 
The following concepts mix weighted and unweighted ones. 

• Given /i G N and two nodes v,u G V with hop distance hd(f , n) < h, we define the h-weighted 
distance wdh{u,v) to be the weight of the lightest path connecting v and u with at most h 
hops, i.e., wdh{v,w) '= m.m{W{p) \ p G paths(t;, ti;) and i{p) < h}. If hd{v,u) > h, we define 
wdh{v,u) '= OG. (Note that wdh does not satisfy the triangle inequality.) 

• The shortest paths diameter of a graph, denoted SPD, is the maximal number of hops in shortest 

def 

paths: SPD = max^^^jgy {min | W{p) = wd{u,v)}}. 
Finally, given a node v and an integer i > 0, we define bally (i) to be the set of the i nodes 

def 

that are closest to v (according to wd, where identifiers are used to break symmetry): hally{i) = 
{u : \{w : {wd{v,u),u) < {wd{v, ■w),w)} \ < i}. Note that our concept of ball differs from the usual 
one: we define a ball by its center and volume, namely the number of nodes it contains (and not by 
its center and radius). 

We have the following immediate property. 

Lemma 2.2 Let v,uE:V. If u E ballv{i) for some z G N then wd{v,u) = wdj{v,u) for all j >i — l. 

Proof: Clearly wd{v, u) < wdj{v, u) < wdi-i{v, u), and it therefore suffices to show that wdi_i(v, u) = 
wd{v,u). Let p = {v = vo,vi, . . . ,Vk = u) be a shortest path from v to u. Since edge weights are 
strictly positive, we have that all the k nodes vq, . . . , v^-i are strictly closer than u to v. Hence, since 
u G bally{i), we have that i > k + l. It follows that wd{v,u) = •wdi-i{v,u) and we are done. | 



3 Problem Statement and Lower Bounds 
3.1 The Routing Problem 

In the routing table construction problem (abbreviated RTC), the local input at a node is the weight 
of incident edges, and the output at each node v consists of (i) a unique label \{v) and (ii) a function 
"next„" that takes a destination label A and produces a neighbor of v, such that given the label X{u) 
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of any node u, and starting from any node v, we can reach u from v by following the next pointers. 
Formally, the requirement is as follows. Given a start node v and a destination label X{u), let vq = v 
and define Vj+i = nextt,.(A(n)) for i >Q. Then for some i we must have Vi = u. 

The performance of a solution is measured in terms of its stretch: A route is said to have stretch 
p > 1 if its total weight is no more than p times the weighted distance between its endpoints, and a 
solution to RTC is said to have stretch p if all the routes it induces have stretch at most p. 

Variants. Routing appears in many incarnations. We list a few important variants below. 

Name-independent routing. Our definition of RTC allows for node relabeling. This is the case, 
as mentioned above, in the Internet. The case where no such relabeling is allowed (which can be 
formalized by requiring A to be the identity function), is called name-independent routing. 

It can be shown that assigning new labels to the nodes is unavoidable by proving that any (ran- 
domized) algorithm achieving polylogarithmic (expected) stretch without relabeling must run for i7(n) 
rounds. Formally, we can prove the following. 

Theorem 3.1 In the CONGEST model, any algorithm for rtc that produces name-independent 
stateful routing with expected average stretch p requires il(n/(/9^ logn)) time. 

Stateful routing. The routing problem as defined above is stateless in the sense that routing a 
packet is done regardless of the path it traversed so far. One may also consider stateful routing, where 
while being routed, a packet may gather information that helps it navigate later (one embodiment of 
this idea in the Internet routing today is MPLS, where packets are temporarily piggybacked with extra 
headers). Note that the set of routes to a single destination in stateless routing must constitute a tree, 
whereas in stateful routing even a single route may contain a cycle. Formally, in stateful routing the 
label of the destination may change from one node to another: The next^ function outputs both the 
next hop (a neighbor node), and a new label A^, used in the next hop. 

Name-independent routing. Our definition of RTC allows for node relabeling. This is the case, 
as mentioned above, in the Internet. The case where no such relabeling is allowed (which can be 
formalized by requiring A to be the identity function), is called name-independent routing. 

It can be shown that assigning new labels to the nodes is unavoidable by proving that any (ran- 
domized) algorithm achieving polylogarithmic (expected) stretch without relabeling must run for (l{n) 
rounds. What might come as a surprise here is that the result also applies to stateful routing. 

Theorem 3.2 In the CONGEST model, any algorithm for RTC that produces name-independent 
routing with (expected) average stretch p requires r2(n/(p^ logn)) time. 

3.2 The Distance Approximation Problem 

The distance approximation problem is akin to the routing problem. Again, each node v outputs a 
label \{v), but now, v needs to construct a function dist^, : \{V) — >■ (the table) such that for all 
w e V it holds that dist^(iu) > wd{v,w). The stretch of the approximation for a given node w is 
distv{w)/wd{v,w), and the solution has stretch p > 1, if dist^(u;) < pwd{v,w) for all v,w eV. 

Similarly to routing, we call a scheme name-independent if A is the identity function. Since we 
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require distances estimates to be produced without communication, there is no "stateful" distance 
approximation. 

3.3 Hardness of Name-Independent Distributed Table Construction 

While name-independence may be desirable, our routing and distance approximation algorithm makes 
heavy use of relabeling. This is unavoidable for fast construction, because, as the following two 
theorems show, any name- independent scheme of polylogarithmic stretch requires r2(n) rounds for 
table construction. The lower bound holds even for stateful routing and average stretch. Moreover, 
since the construction below is generic, intuitively it implies that there is no reasonable restriction, 
be it in terms of topology, edge weights, or node degrees, that permits fast construction of name- 
independent routing tables.^ 

Theorem 3.3 In the CONGEST model, any name-independent routing scheme of (expected) average 
stretch p requires logn)) rounds for table construction. This holds even if all edge weights are 

1, the graph is a tree of constant depth, and the node identifiers are 1, . . . ,n. 

Proof: We assume w.l.o.g. that all set sizes we use in this proof are integer and that nodes may send 
no more than exactly logn G N bits over each edge in each round. Consider the following family of 
trees of depth 2. The root is connected to ni G ©(/o) inner nodes, each of which has n2 children; 
denote by I and L the respective sets of nodes. All edges have weight 1, i.e., the maximal simple path 
weight is 4. 

We assign the identifiers l,...,n uniformly at random to the nin2 leaves (w.l.o.g., we neglect 
that the total number of nodes is nin2 + ni + 1 in the following and use n instead). Consider any 
deterministic algorithm constructing routing tables within r G N rounds. From each node in /, the 
root receives at most rlogn bits, hence there are at most 2^"'i'°s"' possible routing tables at the root. 
Now consider the n!/(n2!)"^ possible partitions of the leaf identifiers to the subtrees rooted at nodes 
from /. We bound the number of such partitions for which a fixed routing table at the root may serve 
a uniformly random routing request with probability at least p correctly. This requirement translates 
to at least pn identifiers being exactly in the subtree where the routing table points to; we have (^) 
possible choices for these identifiers. The remaining (1 — p)n identifiers may be distributed arbitrarily 
to the remaining subtrees. Depending on the distribution of the pn identifiers we already selected, 
the number of possibilities for this may vary. Using standard arguments it can be shown that this 
quantity is maximized if the pn identifiers are distributed evenly among the subtrees, i.e., each of them 
contains pn2 of them. We conclude that no routing table can serve a uniform request with probability 
at least p for more than — p)n)!/((l — p)n2)!"^ of the possible input partitions. Considering 

the number possible routing tables and the total number of input partitions n!/(n2!)"^, we have that 



''The lower bound graph can be adapted to be a balanced binary tree, weakening the lower bound on the stretch by 
factor logn. 




< 2 



,rni log n 
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We distinguish two cases, the first being p < e^/ni. We seek to upper bound p in the second case 
as well, where p > e^/rai. Clearly the l.h.s. of the above inequality is increasing in p G [0, 1]. Together 
with Stirling's approximation x\ G e^^"''^^))^^^"^"^) we can bound 



The assumption that p > e^/ni thus implies (for sufficiently large n) that r > n2/(ni logn). 

Now condition on the event that for the given routing request the table does not lead to the correct 
subtree. We fix the (uniformly random) subset of leaf identifiers in the subtree S the root's routing 
table points to, and conclude that the set of remaining identifiers is a uniformly random subset of 
n — n2 — 1 leaf identifiers plus the destination's identifier. Moreover, the destination is uniformly 
random from this subset and the remaining identifiers are uniformly distributed among the remaining 
subtrees. We delete S from the graph (since clearly there is no reason to route to S again) and 
examine the next routing decision of the root. We observe that the situation is identical to the initial 
setting except that ni is replaced by ni — 1. Note also that S contained no valuable information: We 
deleted S and the identifiers in S from the graph, and any other information known to nodes in S 
must have been communicated to S by the root. Hence, repeating the above arguments, we sec that 
the probability to find the destination in the second attempt conditioned on the first having failed is 
at most e^/(ni — 1) or r > n2/(ni logn). By induction on the number of routing attempts, we infer 
that for i G {1, . . . , ni/2}, the probability pi to succeed in the i*'* attempt to route from the root node 
to the subtree containing the destination (conditional on the previous attempts having failed) is upper 
bounded by 2e^/ni unless r > n2/(ni logn). 

Overall, the probability that a deterministic algorithm constructing routing tables within r < 
n2/(nilogn) rounds fails to serve a uniformly random routing request at the root for uniformly 
distributed leaf identifiers using fewer than ni/2 attempts (i.e., visits of the root on the routing path) 
is lower bounded by 



Note that an analogous argument holds for routing requests issued at other nodes, since they 
have a large probability to require routing to a different subtree. Therefore, the average stretch of 
any deterministic routing algorithm running for fewer than n2/(nilogn) rounds is at least Q{ni). 
By Yao's principle, the expected average stretch of randomized algorithms running for fewer than 
n2/(nilogn) rounds thus must also be in Q(ni). Recalling that ni G ©(p) and n2 = n/p, we get 
that r G o(n/(p^ logn)) rounds are insufficient to achieve (expected) average stretch p, proving the 
statement of the theorem. | 

A streamlined version of the argument shows that a similar lower bound applies to distance ap- 
proximation. 



(pn)!((l - p)n2l) 
n2!"i 



(e2n2)!((l-eVni)n2)"i 



n2!'*i 



g g(l-o(l))(e2n2(ln(e2n2)-l)+n(l-eVni)(ln((l-eVni)n2)-l)-n(lnn2-l)) 

(- g(l—o(l))(e^n2 (In 712+1)— e^n2(lnn2 — l)+nln(l—e^/ni)) 

^ g(l-o(l))(2e2n2-e2n2) 

^ g(e2-o(l))n2_ 
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Theorem 3.4 In the CONGEST model, any name-independent distance approximation scheme of 
(expected) average stretch p requires J7(n/logn) rounds for table construction in graphs with edge 
weights of 1 and Wmax £ 0{p) only. This holds even if the graph is a star and the node identifiers are 
1, . . . ,n. 

Proof: Again, we assume w.l.o.g. that all considered values are integer and that link capacity is logn 
bits per round. Suppose G is a star with n leafs (we neglect w.l.o.g. the center in the node count). All 
edges have weight t^^max with independent probability 1/2; the remaining edges have weight 1. 

Condition on the event that some fixed node's v incident edge has weight 1. Thus, there are two 
possible path weights to other nodes: Wmax + 1 and 2. Within r rounds, the node receives at most 
rlogn bits, yielding 2'''°s"' possible distance estimate configurations. In order to be /)- approximate for 
p < (wmax + l)/2 and a given other leaf, f 's table must output an estimate of at most 2p < Wmax + 1 
in case the leaf's edge has weight 2 and at least Wmax + 1 if the leaf's edge has weight Wmax- Thus 
any given table can be correct for a given leaf for only one of the two possible choices of the leaf's 
edge's weight. There are 2" possible edge weight assignments. By the above observation, a fixed table 
is /9- approximate for a given destination with probability 1/2. By Chernoff's bound, this implies that 
the probability that a fixed table is correct for a fraction of 3/4 of the destinations is bounded by 
2-r2(n)_ gy ^YiQ union bound, it follows that for the given uniformly random edge weight assignment, 
the probability that the computed table is correct for a fraction of 3/4 of the destinations is upper 
bounded by 2~^("')2^''°s". This implies that r E r2(n/logn) or the average stretch of node u's table 
must be Q{uj^slx)- 

By symmetry, the same applies to all nodes incident to an edge of weight 1. By Chernoff's bound, 
w.h.p. at least one quarter of the nodes satisfies this property, i.e., the probability mass of the events 
where fewer than n/4 edges have weight 1 is negligible. By linearity of expectation, it follows that 
any deterministic algorithm running for o(n/logn) rounds exhibits average stretch 0,{uj^s.x), and by 
Yao's principle this extends to the expected stretch of randomized algorithms. | 

Consequently, in the remainder of the paper we shall consider name-dependent schemes only. 
3.4 Hardness of Diameter Estimation 

In [12], it is shown that approximating the hop-diameter of a network within a factor smaller than 1.5 
cannot be done in the CONGEST model in d{y/n) time. Here, we prove a hardness result for the 
weighted diameter, formally stated as follows. 

Theorem 3.5 For any cjmax ^ V^, there is a function a{n) G Q(ujmax/ V^) such that the following 
holds. In the family of weighted graphs of hop-diameter HD E ©(logn) and edge weights 1 and 
<^ma.x only, an (expected) a{n) -approximation of the weighted diameter requires Q{y/n) communication 
rounds in the CONGEST model. 

Proof sketch: We construct a graph Gn with Q{n) nodes. Let m = -y/n G N. The graph consists of 
the following three conceptual parts. Figure 1 illustrates a part of the construction. 

• Nodes Vij for I < i, j < m. These nodes are connected as m paths of length m — 1. All path 
edges are of weight 1. 
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Figure 1: An illustration of the graph used in the proof of Theorem 3.5. Thick edges denote edges of 
weight Wiiiax, other edges are of weight 1. The shaded triangle represents a binary tree 

• A star rooted at an Alice node, where the children are vi^i, . . . , Vm,i, and similarly, a star rooted 
at a Bob node, whose leaves are Vm,i, ■ ■ ■ , Vm,m- We specify the weights of these edges later. 

• For each 1 < j < m there is a node Uj connected to all nodes Vij, 1 < i < m in "column" j, 
with edges of weight Wmax- In addition, there is a binary tree whose leaves are the nodes uj. All 
tree edges have weight 1. Finally, Alice and Bob are connected to ui and Um, respectively, by 
edges of weight 1. 

It is easy to see that the hop-diameter of G„ is 0{\ogn): the hop-distance from any node to one of 
the nodes Uj is O(logn), and the distance between any two such nodes is also 0{logn). However, 
the majority of the short paths guaranteeing the small diameter passes through very few nodes close 
to the root of the binary tree. Consequently, it takes a long time to exchange a large number of bits 
between Alice and Bob, implying that it is hard to decide set disjointness for sets held by Alice and 
Bob in the CONGEST model. Specifically, the following fact is a direct corollary from [7]. 

Fact 3.1 ([7]) Let Ai '= {1, . . . , m}. Suppose that node Alice holds a set A <^ Ai and that node Bob 
holds a set B C Ji4. Then finding whether An B = (}> takes n(m) rounds in the CONGEST model, 
even for randomized algorithms. 

We now show that if the diameter of G„ can be approximated within factor oj^ax/ in time T in 
the CONGEST model, then the set disjointness problem problem can be solved in time T + 1. To 
this end, we set the edge weights of the stars rooted at Alice and Bob as follows: for all i G {1, . . . , m}, 
the edge from Alice to Vi^i has weight Wmax i € A and weight 1 else; likewise, the edge from Bob to 
Vi^m has weight Wmax i £ B and weight 1 else. 

Note that given A at Alice and B at Bob, we can inform the nodes Vi^i and Vi^rn of these weights in 
one round. Now run any algorithm that outputs a value between WD and a(n)WD a;maxWD/(-y/n-|- 
Clogn) (for a suitable constant C) within T rounds, and output and B are disjoint" if the outcome 
is at most Wmax and output and B are not disjoint" othwerwise. 
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It remains to show that the outcome of this computation is correct for any inputs A and B and the 
statement of the theorem will follow from Fact 3.1 (recall that the number of nodes of Gn is G(n)). 
Suppose first that AU B = ^. Then for each node Vij, there is a path of at most ^/n edges of weight 1 
connecting it to Alice or Bob, and Alice and Bob are connected to all nodes in the binary tree and each 
other via 0(\ogn) hops in the binary tree (whose edges have weight 1 as well). Hence the weighted 
diameter of Gn is ^/n + O (log re) in this case and the output is correct (where we assume that G is 
sufficiently large to account for the ©(log re) term). Now suppose that i € An B. In this case each 
path from node Vi^i to Bob contains an edge of weight t^maxi since the edges from Alice to Vi^i and Bob 
to 

Vi^m &s well as those connecting Vij to uj have weight tJniax- Hence, the weighted distance from Vi^i 
to Bob is strictly larger than Wmax and the output is correct as well. This shows that set disjointness 
is decided correctly and therefore the proof is complete. | 

3.5 Hardness of Name-Dependent Distributed Table Construction 

A lower bound on name-dependent distance approximation follows directly from Theorem 3.5. 

Corollary 3.6 For any Umax ^ there is a function a(re) G ^{uJumx/ such that the following 
holds. In the family of weighted graphs of hop-diameter HD G ©(log re) and edge weights 1 and Wmax 
only, constructing labels of size ^{^/n) and tables for distance approximation of (expected) stretch a{n) 
requires Q{^/n) communication rounds in the CONGEST model. 

Proof: We use the same construction as in the previous proof, however, now we need to solve the 
disjointness problem using the tables and lables. Using the same setup, we run the assumed table and 
label construction algorithm. Afterwards, we transmit, e.g., the label of Alice to all nodes Vi^i. This 
takes o(\/re) rounds due to the size restriction of the labels. Then we query the estimated distance to 
Alice at the nodes Vi^i and collect the results at Alice. Analogously to the proof of Theorem 3.5, the 
maximum of these values is large if and only if the input satisfies that AO B = 9. Since transmitting 
the label costs only ^{^/n) additional rounds, the same asymptotic lower bound as in Theorem 3.5 
follows. I 

A variation of the theme shows that stateless routing requires Q{y/n) time. 

Corollary 3.7 For any i^max > there is a function a{n) £ Q{^ujraax/n) such that the following 
holds. In the family of weighted graphs of hop-diameter HD G ©(log re) and edge weights 1 and Wmax 
only, constructing stateless routing tables of (expected) stretch a{n) with labels of size o{y/n) requires 
r2(\/re) communication rounds in the CONGEST model. 

Proof sketch: We consider the same graph as in the proof of Theorem 3.5 and input sets A and B 
at Alice and Bob, respectively, but we use a different assignment of edge weights. 

• All edges incident to a node in the binary tree have weight Wmax- 

• For each i G {1, . . . , rre}, the edge from Alice to Vi^i has weight ix>max if ^ G vl and weight 1 else. 
Likewise, the edge from Bob to Vi^m has weight Wmax if i G i? and otherwise weight 1. 

• The remaining edges (on the rre paths from Vi^i to Vi,m) have weight 1. 

Observe that the distance from Alice to Bob is ^/n + lifAni?7^0 and strictly larger than Wmax if 
^ n i? = 0. Once static routing tables for routing on paths of stretch at most Wmax/ (\/^ + 1) are set 
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up, e.g. Bob can decide whether A and B are disjoint as follows. Bob sends its label to Alice via the 
binary tree (which takes time o{^/n) if the label has size 5{^/n)). Alice responds with "i" if the first 
routing hop from Alice to Bob is node Vi^i and i £ A (i.e., the weight of the edge is 1), and ^^AdB = 0" 
else (this takes 0{logn) rounds). Bob then outputs "A n -B 7^ 0" if Alice responded with "i" and 
i £ B (i.e., the weight of the routing path is ^/n + 1 since the edge from Bob to Vi^m has weight 1) and 
"AnB = 0" otherwise. 

If the output is "AnS / 0" , it is correct because i € ACiB. On the other hand, if it is ''ACiB = 0" , 
the route from Alice to Bob must contain an edge of weight Wmax, implying by the stretch guarantee 
that there is no path of weight -y/n + 1 from Alice to Bob. This in turn entails that A D B = 9 due 
to the assignment of weights and we conclude that the output is correct also in this case. Hence the 
statement of the corollary follows from Fact 3.1. | 

We remark that Theorem 3.4, Theorem 3.5, Corollary 3.6, and Corollary 3.7 have in common that 
if edge weight is permitted, no stretch bound faster than the stated lower bounds even if the only 
other feasible edge weight is 1. 

Finally, we note that the hop-diameter is also an obvious lower bound on the time required to 
approximate the weighted diameter, construct stateless routing tables, etc. since if the running time 
is smaller than HD, distant parts of the graph (in the sense of hop-distance) cannot influence the local 
output. 

4 Routing Algorithm 

Overview. To construct routing tables, one needs to learn about paths. Naive distributed algorithms 
explore paths sequentially, adding one edge at a time, leading to potentially linear complexity, since 
shortest weighted paths may be very long in terms of the number of edges. Our basic idea is to break 
hop-wise long paths into small pieces by means of random sampling. Specifically, motivated by the 
fl{^/n) lower bound of Theorem 3.5, we select a random subset of Q^/n nodes we call the routing 
skeleton. It follows that, w.h.p., (1) any simple path of hop-length ^l(^/n) contains a skeleton node, 
and (2) any node has a skeleton node among its closest 0{^/n) nodes. The route that our scheme 
will select from a given source to a given destination depends on their distance: If the destination is 
one of the 0{^/n) nodes closest to the destination, routing will be done using a "short range scheme" 
(see below); otherwise, the short range scheme is used to route from the source to the nearest skeleton 
node, from which, using another scheme we call "long distance routing," we route to the skeleton node 
closest to the destination node, and finally, another application of the short range scheme brings us 
to the destination. Intuitively, we can split the problem into the following tasks: 

1. Short range scheme: how to route efficiently from each node to its 0(-y/n) closest nodes including 
at least one skeleton node, and, conversely, from a skeleton node to all its "subordinates" (note 
the asymmetry in this case). 

2. Skeleton routing scheme: how to route between skeleton nodes efficiently. 

The short range scheme is described in Section 4.2. We note that since a straightforward application 
of multiple-source shortest paths may result in linear time, we develop a hierarchical structure to 
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Algorithm 1: BSP(/i, A, S): Bounded shortest paths, computed at node v £ V. 
input : h // range parameter: hop bound on path lengths, globally known 

A //overlap parameter: number of closest sources each node needs to detect, globally known 

source : V — > SID U {_L} / / each v knows source(w); source(w) — _L means v is not a source 
computes: For all t G {1, . . . , h}: weighted distance and tiie next h.op from v to eacii of th.e closest A 

source node sets using patlis of at most t edges (or all such sets, if there are at most A within 

t hops). 

1 if source(u) 7^ _L then Ly{0) :— {(0, source(u), w)} else Ly{Q) := //initialization 

2 for t := 1 to h do 

3 send Ly{t — 1) to all neighbors; Ly{t) :— 

4 foreach neighbor u do 

5 receive L„(t — 1) 

6 foreach s„,next„) e L^it — 1) do 
r if 3{dv, Sy,nexty) € Ly(t) s.t. Sy = Sy then 
8 if {du + W{u,v),u) < {dy, nexty) then 

Ly{t) := Ly{t) \ {{dy,Sy, nexty)} U {{dy+W {U, V) , Sy, u)} 

else Ly{t) := Ly{t) U {{du + W{u,v), Sy,u)} 



/ / Bellman-Ford relaxation 
/ / comparisons are lexicographical 



truncate Ly{t) to smallest A entries 
12 return (L^(l), 



//order is lexicographical 



solve the short-range routing. This hierarchy bears resemblance to the Thorup-Zwick distance oracle 
algorithm [30]. Our long distance routing is described in Section 4.3. The main challenge there is to 
build the skeleton graph; since it might be too dense, we sparsify it "on the fly" while constructing it. 
This construction is implemented by adapting the spanner algorithm of Baswana and Sen [3] to our 
setting. 

We start by describing the variant of the Bellman-Ford algorithm we use as a basic building block 
in Section 4.1. 



4.1 Bounded Shortest Paths 

We now describe a basic subroutine we use. Algorithm BSP, whose pseudo code is given in Algorithm 1, 
is essentially a standard multiple-source distributed Bellman-Ford algorithm, with two restrictions: 
first, the algorithm is run for only h rounds (cf. Line 2); and second, nodes never report more than A 
sources (cf. Line 11). 

We consider a slightly extended variant of the algorithm: In the original algorithm, each node is 
a "source" and the goal is to compute the distances of all nodes to it. Here we assume that (i) not 
all nodes are sources, and (ii) sets of nodes may act as a single source, as if there were 0-weight edges 
connecting them. Both extensions are modeled by the source function, that maps a node to _L if it is 
not a source, or multiple nodes to the same source ID if they are in the same source set. We use S to 
denote the set of sources, i.e., S = {source(f) | v € V} \ {-L}, and for each s £ S, the source nodes of 
s is SN{s) := {v \ source(i') = s}. Note that the source function uniquely determines the source sets 
and vice versa. We assume that a source ID can be encoded using 0{logn) bits. 
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We analyze the algorithm leveraging the correctness of the basic Bellman-Ford algorithm. To this 
end, let us define Algorithm 1* by omitting Line 11 from Algorithm 1 and fixing h = n — 1. Observing 
that Algorithm 1* is exactly the distributed Bellman- Ford algorithm, we may conclude the following 
standard property. 

Lemma 4.1 Fix an execution of Algorithm 1*. Denote by L*(t) for some < t < n — 1 and v £ V 
the contents of the variable at node v after t iterations of Algorithm 1 *. Then for each {d, s, next) 
entry in L*{t) we have that s is a source and wdt(f , s) := min„g5'7v(s){wdf(f , u)} = d, namely d is the 
length of the shortest path that consists of at most t edges from v to any node u in the source set of s. 
Moreover, next is the next node on that shortest path from v to u. 

Lemma 4.1 says that running only h iterations is sufficient if we are interested in paths of h or less 
edges only. We now consider the effect of repeatedly truncating the distance vector. 

Lemma 4.2 Consider executions of Algorithm 1 and of Algorithm 1* on the same graph and with 
the same source function. Let Ly{t) and L*{t) denote the contents of the variable at node v after 
t iterations under Algorithm 1 and under Algorithm 1*, respectively. Then Ly{t) contains exactly the 
smallest A entries of L*{t) with respect to lexicographical ordering (or the entire list, if \L*{t)\ < A). 

Proof: By induction on t. The base case is t = 0, and the lemma clearly holds upon initialization 
(Line 1). For the induction step, assume that the lemma holds for f — 1 S {0, . . . , /i — 1} at all nodes, 
and consider iteration t at some node v. By the induction hypothesis, we have that the message 
received by v from each neighbor u at time t under Algorithm 1 is exactly the top A entries sent by 
node u at time t under Algorithm 1*, because these entries are computed at the end of iteration t — 1. 
The lemma therefore follows from the fact that for any A; > 0, the smallest k entries of a union of sets 
are contained in the union of the smallest k entries from each set. | 

Note that the information provided by Ly{h) is insufficient for routing: since the A closest source 
node sets may differ between neighbors, it may be the case that for some source identifier s and two 
neighbors v and u we have that u is the next node from v to s in L^[h), but there is no entry for 
source s in Lu{h)\ This occurs, for example, if in iteration h, u learns about a source set closer than 
SN{s), pushing s out of Lu{h). However, since the algorithm returns (-L„(l), . . . , Lv{h)) instead of 
simply Ly{h), we can still reconstruct the detected paths. 

Lemma 4.3 For any node v and any entry (d, s, next) G Ly(h), a routing path of at most h hops from 
V to a node in s of weight d can be constructed using the L tables at the nodes and a hop counter. 

Proof: The routing decision for hop t at the current node vt-i (where vq := v) is made by looking 
up the entry {dt-i, s, next) G L„(/i — (t — 1)). We show by induction on the length i < h of a, shortest 
path from v to its closest node u G SN(s) that such an entry always exists. Note that by Lemmas 4.1 
and 4.2, such an entry satisfies that dt-i = wd/j_(j_i)(ut_i, u) and thus the constructed path has 
weight wdh{v,u) = d. Trivially, the claim is true for £ = by initialization of the lists Lu(0), u £V. 

Now suppose the claim holds for £ G No and consider node v with entry (wd(t;, u), s, next) G L^Qi). 
Suppose w is the neighbor of v which is next on the shortest £-hop path from v to u. Hence it is the 
endpoint of a of a shortest {(. — l)-hop path from w to n, and there is no shorter path from w to any 
node in SN{s) of at most I — \ hops (otherwise there would be a shorter path of at most i hops from 
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V to a, node in SN{s)). Therefore, by Lemma 4.1, (wdh-iiw, u), s, next^) G Ll^{h — 1) for some next^. 
Assuming for contradiction that {wdh-i{w, u), s, next^) ^ Lu{h — 1) imphed, by Lemma 4.2, that there 
are A entries ((i,s', next) G Lu{h — 1) that are lexicographically smaller than {wdh-i{w,u), s,nextu,). 
Node w would send these smaller entries in iteration h of Algorithm 1, yielding the contradiction that 
(wd(t>, n), s, next^o) ^ L^{h). It follows that indeed (wd/i_i(?i;, n), s, next^o) G Lu{h — 1) and the proof 
concludes. | 

We summarize the properties of Algorithm 1 with the following theorem. 

Theorem 4.4 Algorithm 1 computes the h-weighted distance and next hop of a shortest path of at 
most h edges from each node to its closest A source sets. Each node on the corresponding shortest 
path can determine the next hop on the path out of the number of preceding hops and the output of the 
algorithm. The time complexity of Algorithm 1 in the CONGEST model is ©(A/i) rounds. 

Proof: Correctness follows from Lemmas 4.1 and 4.2. Lemma 4.3 proves that the paths can be 
reconstructed as stated. The time complexity follows from the fact that the algorithm runs for h 
iterations, and each iteration can be implemented in 0(A) rounds in the CONGEST model since 
the messages contain 0{/S.) IDs and distances. | 

Stateless routing. The routing mechanism suggested by Lemma 4.3 has the disadvantage that it is 
stateful, as the routing decision depends on the number of previous routing hops. It is easy to make it 
stateless: at each node, a packet is directed toward the hop that reported the best distance estimate, 
i.e., the next hop to take at node v for destination s is arg miunext {d ■ {d, s, next) G Ly{t)}. 

Corollary 4.5 For any node v and any entry (d, ■s,next) G Ly(h), a routing path of at from v to a 
node in s of weight d can he constructed using the local knowledge of the nodes only. 

Proof: Lemma 4.3 shows that if a node w follows the nextu, pointer of any entry (d^, s, next«;) G Lyj{t) 
for any t G {1, . . . , /i}, node next^„ has an entry {d' — W{w., nextt„), s, nextnext™) S L^{t — 1). We thus 
can simply choose to follow at each node v the next pointer of entry (d, s,next) G Ute{i h}-^v{t) 
with minimal d and are guaranteed to eventually arrive at some node in s using a path of weight at 
most d. I 

Note that in general we cannot guarantee that the constructed path has at most h hops when applying 
this mechanism; this holds true, however, if we are routing to one of the h nodes closest to the source 
of the routing request (by Lemma 2.2). This observation will be crucial for making our general routing 
scheme stateless. 

4.2 The Short-Range Scheme 

With Algorithm BSP at hand, we can now describe our short-range routing scheme. Our goal is 
to allow each node to find a route to each of its closest 0(-^/n) neighbors. A naive application of 
Algorithm BSP, where all nodes are sources, would set the overlap parameter to Q{y/n) (this is the 
number of nodes we want to know about), and the range parameter to 0(-y/n) too (in order to find 
the closest G(-^/n) nodes it suffices to go to this hop-distance, cf. Lemma 2.2). However, Theorem 4.4 
tells us that in this case, the time complexity would be 0{/\h) C 0{n), a far cry from the ^}{^/n) 
lower bound from Corollaries 3.6 and 3.7. Our solution is a hierarchical bootstrapping process that 
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converges in double-exponential speed. We show that the stretch is proportional to the number of 
stages in the hierarchy. 



The Construction 




The construction is done iteratively in L stages. In the interest of clarity we describe the construction 
intuitively first and then formalize it. The idea is that on the one hand we want to spend at most a 
certain amount of time, but on the other hand with each stage try to reduce the number of landmarks 
as quickly as possible. This approach is the spirit of Thorup-Zwick distance oracles and routing 
schemes [29, 30], and it is also used in a distributed fashion in [6]. The difficulty lies in constructing 
such a hierarchy quickly.^ 

The sets of landmarks, denoted S*!, . . . ,5^, are sampled uni- 
formly and independently at random without any coordination 

def 

overhead, with Sq = V, and Si C Si-i for 1 < i < L. In the 
i*'* stage, each node finds a route to the closest node in Si as well 
as to all nodes in Si-i that are closer to it. This property allows us 
to bound the routing stretch. The basic argument is a simple ap- 
plication of the triangle inequality (see Figure 2): Consider a route 
from node v to node w. If there is a node u ^ Si that is closer 
to V than w, then the route of shortest paths via u has stretch at 
most 3. It is therefore sufficient for v to determine (the next hop 

def 

of) least-weight routes to nodes in 5o = that are closer to it 
than the closest node in Si only. Using double induction, we can 
bound the stretch of the multi-stage application of this technique 
we employ. 

To this end, in each stage we invoke Algorithm 1 with source set Si-i. We now explain how to 
choose the parameters hi and Aj for this invocation. Let pi be the probability of a node to be selected 
into Si. Then w.h.p., each node v has a member of Si among the 0(logn/pi) nodes closest to v. 
Hence, this is a good choice for the distance parameter hi. The expected number of nodes from Si-i 
among the hi nodes closest to a given node is pi-ihi. Applying Chernoff's bound shows that this 
number is bounded by 0{pi-ihi) = 0{pi-i \ogn/pi) w.h.p. This is an upper bound on the number of 
sources that need to be detected by each node and therefore is our choice of the overlap parameter 

The resulting running time of the call to Algorithm BSP is 0(/ijAj) C 0{pi-i/pf). Since this 
is the dominating term in the running time in each stage, it is now easy to determine the sampling 
probabilities: neglecting polylogarithmic factors, we get the simple recursion pi = ^JpiZijT ^ where T 



Figure 2: The distance from u to w 
is at least one third of the length of 
the route from v \,o w via u. 



1. For example, if we want to ensure a running time bound of 



def 

is the desired running time and po = 
T G ©(-y/n), we obtain: 

^In [6], distance sketches are constructed distributedly using exhaustive search with respect to distances, i.e., Bellmann- 
Ford is run for sufficiently many iterations until all routes become stable. This approach has time complexity f2(SPD) 
and therefore cannot guarantee a running time of bin) on all graphs of diameter HD £ bin). 



17 



• sampling probabilities of n ^/®,n ''/^^^ . . i.e., = n i)/2'+^. 

• expected set sizes of e{n^/^),e{n^/^), e{n^/^^), . . ., i.e., \Si\ e e(ni/2+i/2") w.h.p.; 

• range parameters of B(n"'^/^ logn), 0(n^/^ logn), 0(n''/^^ logn), i.e., hi € @{n^^'^^^/'^^~^ logn); 

• overlap parameters of B(n"'^/^ logn), logn), 0(n^/^^ logn), i.e., Aj G Q{n^^'^'^^ logn). 

(Note that L = log logn stages suffice to ensure that Sl G Q{^/n) w.h.p.) Running Algorithm BSP 
with parameters as above, we get that w.h.p., after 0{^/n) time, each node knows of the closest Aj 
nodes from Si-i and how to route to them for all 1 < i < L. But this is not sufficient: we also need 
to be able to route back from the nodes in Si. 

Given a node v, define Yi(v) to be the node closest to v in Si (symmetry broken by identifiers), 
and let Cy{i) {u G V \ Yi{u) = v}, i.e., for each stage i, the sets Cv{i) are a Voronoi decomposition 
of V with centers Yi(y). Note that routing from Yi{v) to Cy{i) is not as simple as thee other direction: 
While the depth of the tree rooted at Yi{v) is bounded by hi, there is no non-trivial upper bound 
on the number of nodes in the tree. This can be solved by a number of standard techniques for tree 
routing (e.g., [27]). To minimize space consumption, we use the technique of [29], which constructs 
routing tables of size 0(1) and node labels of ©(logn) bits in 0{hi) time. In a nutshell, the idea is first 
to count the sizes of subtrees (which can be done in 0{hi) rounds) and then construct "mini routing 
tables" for the "heavy" part of the tree, where a node is considered heavy if its subtree contains at 
least n/[\/log n] nodes. Then this process is applied recursively in the subtrees rooted at children of 
heavy nodes. From the description in [29], one can verify that each recursive step of the construction 
can be performed in time 0{hi) in a tree of depth hi in the CONGEST(log n) model. There are at 
most log^/j(^n recursive steps, summing up to a total of 0{hi) rounds to construct labels and routing 
tables. 

Formally, given natural numbers n and L < log logn, we define the following for 1 < i < L. 
. po 1, and K (^)-{2V(2^-i))(2^-i)/2\ 

• For each node v, l^(i) is the node from Si closest to v (ties broken by hop distance and ID). 

• For each u £ Si, define Cu{i) '= {v \ Yy{i) = u}, and C„(0) *== {u}. 

• For each node v, define Hy{i) =^ {u G Si-i \ wd(v,u) < wd{v, Yy{i))}. 

Our construction maintains (w.h.p.) the following properties at stage i G {1, . . . , L}. 

(1) Si is a uniformly random subset of where Pr[t> G Si] = pi and Pr[t; £ Si \ v £ Si-i] = 

= (^/^)-2"/(2'(2'-i)). 

(2) For any node v, it it is possible to route from v to Yy(i) on a least-weight path. 

(3) For any node v, it is possible to compute y„(i) and wd(f,y„(i)) from the label of v. 

(4) For any node it G 5^, it is possible to route from u to any node w G Cu{i) on a least- weight 
path. 

(5) For any node v, Hy{i) is locally known at v, and it is possible to route from v to any node 
u G Hy{i) on a least-weight path (whose weight is known at v). 

Suppose that we have such a hierarchy of L stages. Then, given the label of any node w G 
Ui<i<L Uue-ffi,(j) ^"(^ ~ ^O"^^ ^ ^™ route a message to w as follows: First, find some i G {1, . . . , L} 
such that w G Cu{i — 1) for some u G Hy{i) (cf. Property (3) and Property (5) of the construction). 
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The route from w to w is then defined by the concatenation of two shortest paths: the one from v to 
u, and the one from u to w (cf. Property (4) and Property (5)). Moreover, the long-range scheme will 
make sure that we can always route to any destination via the closest skeleton nodes in Sl, which is 
feasible due to Property (2) and Property (4). By always choosing from the available routes such that 
the weight of the computed route is minimal (which can be done by Property (3) and Property (5) for 
the short-range construction, and will also be possible for the long-range scheme), routing becomes 
stateless. 

Stretch Analysis 

We now bound the weight of the routes constructed by the stated scheme with respect to the weight 
of the shortest paths. We note that the argument for the general case is similar in spirit to the simple 
case of i = 1 illustrated in Figure 2. We start with the following key lemma. 

Lemma 4.6 Suppose that for v,w £ V and 1 < j < L we have that w ^ Ui=i Uue_ff„(i) Cu{i — 1). 
Then (a) wd(w, yt,(j)) < {2j — l)-wd{v,w), and (h) -wd{w ,Yw{j)) < 2j'wd{v,w). 

Proof: We prove the lemma by induction on for a fixed j. More specifically, we show for each 
1 < ^ < j that (a) wd(f,fj) < {2i — l)vjd{v,w) and (b) Yfd{w,Wi) < 2iwd{v,'w). For the basis of the 
induction, consider i = in Statement (b). In this case, since Sq = V, we have that, Y^(0) = w and 
Statement (b) holds because wd{v,w) > = ■wd{w,w). 

For the inductive step, assume that Statement (b) holds for < i < j and consider i + 1. Since 
trivially w G CY^(i){i — 1), the premise of the lemma implies that Yw{i) Hy{i + 1). However, 
Yy{i + 1) G Hy{i + 1), and hence we obtain 

wd(u,y„(i + l)) < wd(u,y^(i)) 

< wd{v^w) + 'wd{w^Y^{i)) triangle inequality 

< (2i + l)wd(t;, w) by induction hypothesis 

This proves part (a) of the claim. Using the above inequality we also obtain 

wd{w ^Yw{i + 1)) < wd{w,Yv{i + 1)) wd{'w,Yw{i + I)) < wd{'w,u) for u G Si-\.i 

< wd{w, v) + wd(f , Y^{i + 1)) triangle inequality 

< {2i + 2)wd{v,w) by the proof of part (a), 

which proves part (b) of the claim, completing the inductive step. | 
Lemma 4.6 allows us to prove the following positive result. 

Corollary 4.7 Let v,w £ V , and let 1 < iq < L be minimal such that Y^^io — 1) G Hy{iQ). Then 
wd{v,Y^{io - 1)) + wd(y^(io - l),w) < {Hq - 3)wd{v,w) £ 0{L ■ ^d{v,w)). 

Proof: Note that 

wd(u, l^(io — 1)) + wd(y^(io — 1), It') < vjd{v,w) + 2wd{w^Y.uj{iQ — 1)) triangle inequality 

< wd(v, ^«) + 4(io — l)wd(f , w) Lemma 4.6 

= (4io — 3)wd(f , w) 

and the corollary is proved. | 
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On the other hand, if there is no io as in the corohary, we can conclude from Lemma 4.6 that routing 
via the skeleton nodes closest to source and destination, respectively, incurs bounded stretch. 



Implementation and Time Complexity 

We now explain how to construct the hierarchy efficiently in more detail, and analyze the time com- 
plexity of the construction. Algorithm 2 gives the pseudocode of the above scheme. The algorithm 
is parametrized by the total number of nodes n and the number L of hierarchy stages. Appropriate 
constants c and c' are supposed to be predefined in accordance with the required lower bound on the 
probability of success.^ 

Algorithm 2: Distributed construction of data structure for close-distance routing at v €z V. 
input : 71 G N / /number of nodes 

L G {1, . . . jloglogrt} / /number of stages in the hierarchy 

computes: 1^ G {0, . . . ,L} / /level of v; v ^ Si -i^^ ly > i 

Vi G {1, . . . , L} : Yy{i) G 5^ // closest node in Si 

Vi G {!,..., L} : Hy{i) = {w G Si-i\wd{v,w) < wd(i;, y,(i;)} 

Vi G {1, . . . ,L}Vu G Hy{i) : ncxt^, (u) , d^, (u) / /next routing hop (v if v — u) and distance to u 

1 for i G {0, . . . , L} do p, := (^)-(2^/(2^-i))(2'-i)/2' 

2 ly := i with probability pi — Pj foi' i G {0, . . . , i} 

3 for z G {1, . . . , L} do 

4 



5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 



hi :— c ■ logn/pi / /c and c' are predefined constants controlling the probability of failure 

Ai := c' • h^pi^i 

if ly > i then source(i;) :— {v, ly) else source(ti) := _L 

i,u(/ii)) := BSP(ft.i, Ai, source) / / only Ly{hi) needed 

Hy{i) :=0 
repeat 

let [d, (u, ly), w) be the next entry in Ly[hi) in ascending lexicographic order 

Hy{l) ■.^Hy{i)yj{u} 

next„(u) :— w; dyiu) :— d //exact shortest paths, no distinction of stages needed 

until ly > i', 

Yy(i) := u / /u is the node from closest to v 

construct labels of stage i 



Choosing the sets Si is performed locally without communication. Each node v has level chosen 
independently so that Pr[/^, > i] = Pi (Line 2). Setting Si'= {v ^ V\lv > i} as indicated in the 
algorithm thus satisfies Property (1). In addition, the following properties are easily derived using the 
Chernoff bound, and we state them without proof. 

Lemma 4.8 For appropriate choices of the constants c, d in Algorithm 2, for all 1 < i < L it holds 
w.h.p. that: 

• \Si\ e Q{pin) {\So\ = n). 

®One can verify the properties of the construction and restart a failed iteration within ©(HD) time if desired, implying 
that the stretch guarantee becomes deterministic and the running time probabihstically bounded instead. 
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• For all V gV, \Si n baUy{hi)\ G G(logn). 

• For all V £ V, Hy{i) C hallyQii). 

• For all V eV, \H^{i)\ £ Q{hiPi_i) = 0(pj_i logn/pj). 

• For all V eV, Ai> \Hy{i)\. 

By these properties and Theorem 4.4, Property (5) is satisfied w.h.p., because we invoke Algorithm 
BSP with sources Si, depth parameter hi, and overlap parameter Aj: after this invocation, each node v 
can identify the set Hy{i) and route to any u G Hy{i) on a path of known weight; since H^(i) C hally{hi) 
w.h.p., these routing paths are shortest paths. Moreover, the invocation of BSP(/i, A, 5j) allows each 
node V also to learn what is and route to it on a shortest path of known weight, establishing 

Property (2). In order to satisfy Property (3), we simply add ¥^{1) and wd(f , Yy{i)) = dv{Yy{i)) to the 
label of V for all i. As discussed earlier, routing tables of size log*^*-^^ n and labels of size (1 + o(l)) logn 
to route within Cy^^i) can be constructed within 0{hi) rounds using the scheme from [29], and we add 
the respective tree label to v's label to ensure Property (4). 

We can therefore summarize the complexity of the construction as follows. 

Lemma 4.9 Given 1 < L < log logn, constructing the L-stages short-range routing tables and labels 
can be done in 0{L{^/n)'^^ ^^"^^^^^ log^ n) C 0{{^/n)^^ ^^^^^^^) rounds, and the total label size of a node 
is 0{Llogn) . 

Proof: The implementation of stage i G {1, . . . ,L} involves invoking BSP with parameters hi and 
Aj, which, by Theorem 4.4, takes 

rounds. In addition, we need to relabel the nodes, which, as explained above, can be done in time 
0{hi) C ©(/ijAj), since the depth of the shortest paths tree is bounded by hi < /ijAj. Since there are 
L < log logn stages, the total number of rounds thus satisfies the stated bounds. With respect to the 
label size, note that each stage i adds to the label of node v the identifier of and distance to Yv[i) and 
a tree label of size (1 + o(l)) logn, for a total of ©(logn) bits per stage. | 

4.3 Long-Distance Routing 

We now explain how to route between the nodes in the top level of the hierarchy created by the 
short-range scheme. Our central concept is the skeleton graph, defined as follows. 

Definition 4.10 (Skeleton Graph) Let G = {V,E,W) be a weighted graph. Given 5 C F and 
/i G N, the /i-hop skeleton-5 graph is the weighted graph Gs,h = {S, Es^h,Ws^h) defined by 

• Es.h *== {{v, w} \ v,w ^ S,v w, and hd{v, w) < h} 

• For {v,w} G Es^h, define Ws,hiv,w) to be the h-weighted distance between v and w in G, i.e., 
Ws^h{v,w) wdh{v,w). 

The main idea in the long-distance scheme is to construct a skeleton graph with S = Sl (the top 
level of the short-range hierarchy as constructed in Section 4.2). The choice of h needs to balance two 
goals: on the one hand, the skeleton graph needs to accurately reflect the distances of skeleton nodes 
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in G, and on the other hand, we must be able to quickly set up a tables that allow routing of small 
stretch between the skeleton nodes. 

A simple but crucial observation on skeleton graphs is that if the skeleton S is a random set of 
nodes, and if /i G Q{nlogn/\S\), then w.h.p., the distances in Gs,h are equal to the corresponding 
distances in G. This means that it suffices to consider paths of 0{nlogn/\S\) hops in G in order to 
find the exact distances in G. The following lemma formalizes this idea. (We state it for a skeleton 
containing a random subset; this generality will become useful in Section 5.3.) 

Lemma 4.11 Let Sfi, be a set of random nodes defined by Pr[v G Sr] = it independently for all nodes 
for some given it. Let S 5 Sr. If it > clogn/h for a sufficiently large constant c > 0, then w.h.p., 
■wd5^/i(u, w) = wd{v, w) for all v,w £ S. 

Proof: Fix v,w G 5. Clearly, wdsA{v,w) > wd{v,'w) because each path in Gs,h corresponds to 
a path of the same weight in G. We need to show that wds,h{v,w) < wd{v,w) as well. Let p = 
(uq = v,ui, . . . ,ui(^p^ = If) be a shortest path connecting v and w in G, i.e., W{p) = ^Nd{v,w). We 
show, by induction on ^(p), that wd(p) < wds^h{v,w) w.h.p. 

For the basis of the induction note that if l{p) < h, then by definition wds,hiv,w) < W{p) = 
wd{v, w) and we are done. For the inductive step, assume that the claim holds for all values of 
^(p) < i for some i > h and consider a path of length i{p) = i + 1. Now, E[ |5 n {ui, . . . , Ui}\ ] > 
KIISrCi {ui, . . . , Ui} \ ] = i-K > hn £ i7(logn), and hence, applying Chernoff's bound, we may conclude 
that w.h.p. the intersection is non-empty. Let u £ {ui, . . . ,Ui} D S. Since p is a shortest path in G, 
so are {v, . . . ,u) and {u, . . . ,w). Both these paths are of length at most i, implying by the induction 
hypothesis that wds,hiv,u) < wd{v,u) and wds,hiu,w) < wd{u,w) w.h.p., respectively. Therefore 
^^S,h{v,w) < wds,hiv,u) + wds,hiu,w) < wd{v,u) + wd(u, w) = W{p) = wd{v,w), completing the 
induction. Note that the total number of events we consider throughout the induction is bounded by 
a polynomial in n, and since the probability of the bad events is polynomially small, the union bound 
allows us to deduce that the claim holds w.h.p. | 

Based on this observation, an obvious strategy to solve long-distance routing is to construct Gs,h 
and compute its all-pairs shortest paths. But implementing this approach is not straightforward. 
First, the edges of the skeleton graph are virtual: each edge represents the shortest path of up to h 
hops in G; and second, the number of skeleton graph edges may be as large as fidS'p). We solve both 
problems together: While computing the edges of the skeleton graph, we sparsify the graph, bringing 
the number of edges down to near-linear in the skeleton size. Once we are done, we can afford to let 
each skeleton node learn the full topology of the sparsified skeleton graph, from which approximate 
all-pairs routes and distances can be computed locally. 

Technically, we use the classical concept of sparse spanners, defined as follows. 

Definition 4.12 (Weighted /c-Spanners) Let H = {V,E,W) be a weighted graph and let k > I. 
A weighted k-spanner of H is a weighted graph H' = (V, E' ,W') where E' C V, W'{e) = W{e) for 
all e G E' , and wdH'{u,v) < k ■ wd//(n, f) for all u,v £ V (where wdn and wdn' denote weighted 
distances in H and H' , respectively). 
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We shall compute a spanner of the skeleton graph, while running on the underlying physical graph, 
without ever constructing the skeleton graph explicitly. We do this by simulating the spanner con- 
struction algorithm of Baswana and Sen [3] on the implicit skeleton graph. Let us recall the algorithm 
of [3]; we use a slightly simpler variant that may select some additional edges, albeit without affecting 
the probabilistic upper bound on the number of spanner edges (cf. Lemma 4.14). The input is a graph 
H = {Vh, Eh, Wh) and a parameter k £ N. 

1. Initially, each node is a singleton duster: Ri := {{v} \ v G Vh}- 

2. Repeat k — 1 times (the i^^ iteration is called "phase i"): 

(a) Each cluster from Ri is marked independently with probability | Vh'I^"'^/'^. Ri+i is defined 
to be the set of clusters marked in phase i. 

(b) If t; is a node in an unmarked cluster: 

i. Define to be the set of edges that consists of the lightest edge from v to each of the 
clusters v £ Ri is adjacent to. 

ii. If V has no adjacent marked cluster, then v adds to the spanner all edges in Qy. 

iii. Otherwise, let u be the closest neighbor of f in a marked cluster. In this case v adds 
to the spanner the edge {v,u}, and also all edges {v,w} G Qy with {WHiv,'w),w) < 
{WHiv,u),u) (i.e., the identifiers WjU break symmetry in case Wh{v,w) = Wh{v,u)). 
Furthermore v joins the cluster of u (i.e., if u is in cluster X, then X := X L) {v}). 

3. Each node v adds, for each cluster X £ Rk it is adjacent to, the lightest edge connecting it to 
X. 

For this algorithm, Baswana and Sen prove the following result. 

Theorem 4.13 ([3]) Given a weighted graph H = (Vh, Eh, Wh) and an integer k > 1, the algorithm 
above computes a {2k — l)-spanner of the graph. It has 0{k\VH\^^^^^ iogn) edges w.h.pJ 

Constructing the Skeleton Graph 

In our case, each edge considered in Steps (2b) and (3) of the spanner algorithm corresponds to a 
shortest path. Essentially, we implement these steps in our setting by letting each skeleton node find 
its closest C(|5|^/^ logn) clusters (w.h.p.) by running Algorithm BSP. We now explain how. First, 
all nodes v in a cluster X use the same source identifier source(v) = X (as if they were connected 
by a 0- weight edge to a virtual node X). This ensures that the overlap parameter needs to account 
for the number of detected clusters only, i.e., the number of nodes per cluster is immaterial. Note 
that this implies that the plain version of Algorithm BSP thus will not permit to determine to which 
node a skeleton edge connects; hence we append to each communicated triple {d, s, next) the identifier 
of the actual endpoint u S SN{s) of the respective path and store it when adding a corresponding 
triple to Ly (without otherwise affecting the algorithm). We refer to the modified algorithm as BSP'. 
Second, regarding the range parameter. Lemma 4.11 shows that it is sufficient to consider paths 
of 0{nlogn/\S\) hops only. Finally, the following lemma implies that we may modify the spanner 
construction algorithm in a way that allows us to use a small overlap parameter. 

^In [3], it is proved that the expected number of edges is 0{k\VH\^'^^'''')- The modified bound directly follows from 
Lemma 4.14. 
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Lemma 4.14 W.h.p., the execution of the centralized spanner construction algorithm yields identical 
results if in Steps (2b) and (3), each node considers the lightest edges to the c| Vh|"^/'^ logn closest 
clusters only (for a sufficiently large constant c > 0/ 

Proof: Fix a node v and a phase 1 <i < k. \iv has at most c| V^l^^'^ logn adjacent clusters, the lemma 
is trivially true. So suppose that v has more than clV^ 1^^'^ logn adjacent clusters. By the specification 
of Step (2b), we are interested only in the clusters closer than the closest marked cluster. Now, the 
probability that none of the closest c| Vh|"^/^ logn clusters is marked is (1 — | V/f|~^/'^)'^l^^l^ *iogn g 
y^-rj(c)_ other words, choosing a sufficiently large constant c, we are guaranteed that w.h.p., at least 
one of the closest c| Vh|^/'^ logn clusters is marked. Regarding Step (3), observe that a cluster gets 
marked in all of the first k — 1 iterations with independent probability \Vh\~^^~^^^'' ■ By Chernoff's 
bound, the probability that more than c| V^/|^/^ logn clusters remain in the last iteration is thus 
bounded by 2^^^"^^°^") = n~^^'^\ Therefore, w.h.p. no node is adjacent to more than clV^l^^^logn 
clusters in Step (3), and we are done. | 

As a consequence of Lemma 4.14, we may invoke Algorithm BSP with A G 0(15*1^/^ logn), and the time 
complexity of the invocation is 0(1^^1 • |S'|~^^^''*' log^ n). Detailed pseudo-code of our implementation 
is given in Algorithm 3. Each skeleton node v £ S records the ID of its cluster in phase i as Fi{v); 
nodes in 1/ \ 5 or those who do not join a cluster in some round i have Fi{v) = _L. Algorithm 4 is 
used as subroutine to implement Steps (2b) or (3) (Lines 9 or 20 of Algorithm 3, respectively). 

To prove the algorithm correct, we show that its executions can be mapped to executions of the 
centralized algorithm, and then apply Theorem 4.13. Below, we sketch the main points of such a 
mapping. The implementation of Algorithm 3 is quite straightforward. Note that the broadcast steps 
in Line 1, 8, and 19 ensure that all nodes know the clusters and which are the active clusters in 
each phase. The random choices (Line 7) are made by cluster leaders, namely the nodes v for which 
Fi{v) = V. Lines 10-18 are local computations each node does to get a global picture of the clusters 
for the next phase. The correctness of the implementation of the edge selection of Steps 2b and 3 
of the centralized algorithm by Algorithm 4 was discussed above. We summarize with the following 
lemma. 

Lemma 4.15 Suppose the set S input to Algorithm 3 contains a uniformly random subset Sr of V 
and set h{Sji) '= c • nlogn/|S'ij| for a sufficiently large constant c. Then w.h.p. the following holds. 

(i) Algorithm 3 computes a weighted (2k — \)-spanner of the skeleton graph Gs^hf^Sn) ^^^^ ^■^ known 

at all nodes and has C'(|S'|"^^"^/'^ logn) edges, 
(a) The weighted distances between nodes in S are identical in G5 /j(5^) and G. 
(Hi) The algorithm terminates in 0{n/\SR\^~^^'' + l^l"*^^^/^ + HD) rounds. 

Proof: To prove Statement (i), we note that Algorithm 3 simulates the centralized algorithm, except 
for considering only the closest OdS*!^/^ logn) clusters in Lines 9 and 20. By Lemma 4.14 and by 
Theorem 4.4, this results in a (simulated) correct execution of the centralized algorithm w.h.p. Hence 
Statement (i) follows from Theorem 4.13. 

Regarding Statement (ii), observe that if h{Sfi) > n — 1, the statement holds by definition since 
shortest paths cannot contain cycles and thus Gs^h{SR) = Gs,n-i- Otherwise, we have that \Sr\ > 
c • logn, implying by Chernoff's bound that w.h.p., the probability to select a node into 5 is tt S 
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Algorithm 3: Construction of long range routing skeleton &X v . 
input : S: set of skeleton nodes 

k: integer in [l,log?i] // determines approximation ratio and number of spanner edges 

output: Eh^k'- spanner edges of skeleton graph / /h is defined in Line 4 

Wh^k ■ Eh^k IR'^ / / weights of spanner edges 

1 Ri :— {{w} I w e 5} //initial clusters are singletons of S 

2 Broadcast i?i to all nodes 

3 foreach w do ii w E Ri then Fi{w) := w else Fi{w) := _L //initializing leaders 

4 h :— c ■ nlogn/|S'| / /the constant c controls the probability of failure 

5 A := c-\S\^/''\ogn 
for z := 1 to fc 1 do 

:= uniformly random subset of Ri of size \S\^' 
Broadcast to all nodes 
{E[2),W{£)) :=edges(F„i?,+i,/i,A) 
foreach w do 

if Fi{w) e Ri+i then 
[_ F,+i{w) ■.= F,{w) 

else 

Let E^ be the edges incident to w in E{i) 
if El, ^ then 

Let {w, u} be the heaviest edge in E^ 
if marked{u) then Fi^i{w) := Fi{u) 

else Fi+i{w) _L 

Broadcast i^i+i to all nodes 



6 
7 
8 
9 
10 
11 
12 

13 
14 
15 
16 
17 



//select marked clusters 
/ / select spanner edges, phase i 



20 {E{k), W{k)) edges(Ffe, 0, h, A) 

21 foreach e S U»Li ^^(«) do W^/i,fe(e) V7(fc)(e) 

22 Broadcast E'/j ^ Ui=i -^(*) W^ft.fe to all nodes 



/ /final phase 



0(|5ij|/n) = Q {{c log n) /h{Sji)). As by assumption c is sufficiently large, Statement (ii) now follows 
from Lemma 4.11. 

For Statement (iii), consider first an invocation of Algorithm 4. By Theorem 4.4, the invoca- 
tion of Algorithm 1 in Line 3 takes 0{Ah) C d{\S\^/''h{SR)) = d{n\S\^/'' /\Sr\) rounds. The 
broadcast of Line 10 is done globally. Each skeleton node may communicate up to 0(1 5*1 ^^'^ log n) 
pieces of information for a total of 0(|5'|^"'"^/'^) items. Doing this over a global BFS tree takes 
0(HD + |S'|^+-^/'^) rounds. As k < logn, the total cost of all invocations of Algorithm 4 is thus 
bounded by 6{n\S\^/'' /\Sr\ + + HD) rounds. Consider now Algorithm 3. The only non-local 

steps other than the invocations of Algorithm 4 are the broadcasts, of which the most time consuming 
is the one in Line 22, which takes 0{k\S\^+'^/''logn + HD) C d{\S\^+^/'' + HD) rounds. | 



Routing on the Skeleton Graph 

Algorithm 3 constructs a {2k — l)-spanner of the skeleton graph and made it known to all nodes. 
This enables each node to determine low-stretch routing paths between any two skeleton nodes in 
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Algorithm 4: edges: Edge detection and announcement for long range routing skeleton at v & V. 
input : F : V ^ V U {_L} / /locally known, v 's leader if v is in a cluster, otherwise _L 

RQV //globally known, indicates (identifiers of leaders of) marked clusters 

h //globally known, depth parameter of the search 

A // globally known, number of closest source clusters to detect 

output: E^: edges added to the spanner 
W+ : £;+ ^ M+ edge weights 
1 foreach w E V do 

r(F(v),0) if ^ i? U {_L} 
source(w) := < if F{v) e R //distinguish marked from unmarked clusters 

[_ _L else 

3 Ly :— BSP'(/i, A, source) //variant of Algorithm BSP that keeps track of path endpoints 

5 if ^ i? U {_L} then 

6 / /for each entry (wd, (/, 6), u, w) € u is the next hop on a path of weight wd to w in cluster f 

7 Ly :~ Ly \ {(0, {F(v),0),v,v)} //remove loops (clusters are in distance of themselves) 

8 // recall that is ordered; first entry with b — 1 corresponds to closest marked cluster 

9 foreach (wd, (/, b),u, w) G Ly do 

10 broadcast (wd, {f, w}) // all nodes perform operation! 

11 E+ := E+U{v,w} 

12 W+{{v,w}) := wd 

13 if / e i? then 

14 1^ break /// is closest marked cluster 

15 return {E^,W^) 



Crs,h{SR) by local computation. To use this information, we must be able, for each spanner edge 
{s,t} e £'5/^(5^), to route on a corresponding path in G, i.e., a path of weight W5/j(5^)(s, t). Since 
we rely on Algorithm BSP during the construction of the spanner. Theorem 4.4 shows that we can 
use the computation to enable for each such edge to route from s to t or from t to s: if, say, s added 
the edge to the spanner, then following the pointers computed during the execution of Algorithm BSP 
yields a path of weight PV5/j(5'^)(s, t) from s to t. However, in this case t might not add {s,t} to the 
spanner as well, and hence there is no guarantee that we have sufficient information to route in both 
directions.^ To resolve this issue, we add a post-processing step where we "reverse" the unidirectional 
routing paths, i.e., inform the nodes on the paths about their predecessors. Note that this cannot be 
done in a purely local manner, as exchanging the Bellmann-Ford routing pointers between neighbors 
will not tell a node s G S which pointer to follow to reach a specific node t G S for which {s,t} 
is part of the spanner. However, Corollary 4.4 states that the (unidirectional) routing paths at our 
disposal have at most h{Sij) hops. Taking into account that the spanner has few edges, it follows that 
establishing bidirectional routing pointers can be performed sufficiently fast. 

Lemma 4.16 Let {s,t} be an edge of the spanner Gs^h{Si{) ^^^^ selected by Algorithm 3. W.h.p., 

^Note that unidirectionahty is not an artifact of the specific implementation we picked. E.g., in a star graph, the 
center has degree n—1, as it does in the spanner. Hence we cannot expect the Behmann-Ford pointers to give sufficient 
information for bidirectional routing without further processing. 
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after completing the algorithm, each node on the least-weight s-t path of at most h{Sii) hops in G 
determine the next hop on this path and the weight of the remaining subpath when routing from s to t 
within d{n/\SR\ + \S\^+^/'') rounds 

Proof: For each edge {s,t} added to the spanner by a node s, we route a message on the shortest 
path of at most /i(5'_r) hops from s to t in G. This message initially contains the weight of the path, 
and each node on the path subtracts the weight of the incoming path from this value. By Theorem 4.4 
this is feasible. When a node receives the message, it records the immediate sender as the next hop 
on the path to s and the weight for future reference. By Lemma 4.15, there are at most 
edges in the constructed spanner of 6*5 /1(5^) w.h.p., implying that the maximal number of messages 
routed over each edge of G is bounded by 0{\S\^~^^^^) w.h.p. as well. Moreover, no routing path has 
more than h{Sji) = c - nlogn/\SR\ hops. Since the messages traverse shortest h-hop paths, all of them 
reach their destinations within the stated number of rounds [18]. | 

We now summarize the properties of the long-distance scheme. 

Theorem 4.17 Suppose the set S input to Algorithm 3 is a superset of a uniformly random subset 
SrOV and k G {I,. . . ,logn}. Then, w.h.p., within d{n\S\^/'' /\Sr\ + \S\^+'^/'' + B.B) rounds, there 
are routing tables for routing between nodes in S with stretch {2k — 1). 

Proof: Directly follows from Lemmas 4.15 and 4.16. | 
4.4 Putting the Pieces Together 

Equipped with the results for the short-range and for the long-distance routing, we can state the 
overall algorithm as a simple composition of the two, linked by identifying the skeleton set from the 
long-distance algorithm with the top level of the hierarchy Sl of the short-range algorithm. We run 
the long-range algorithm with parameter k to construct and make globally known the routing skeleton 
and apply the short-range routing scheme with parameter L to deal with nearby nodes. 

Recall that the label of a node w is X{w) = ((yu,(i), wd(i(;, y„(i)), tree^(i))^Q, where tree^(i) 
denotes the label of v in the tree on Cy^^i), and 1^(0) is simply w. Given the label X{w) to a node v, 
V decides on the next routing hop as follows. 

• If = Yuj{i) for some i, choose the next routing hop within Cy^(j) to w according to the 
respective tree label. In this case, d is the distance from to in the tree (which can be 
computed from the distances of v and w to the root and whether the next routing hop is 
the parent of w or a child). 

• Otherwise, node v determines for each i G {1,...,L} whether Yw{i — 1) G H^{i). If so, it 
computes di =^ wd{v, Yw{i — 1)) + wd{Yw{i — l),w). Otherwise set di '= 00. 

• Next, denote by 5"^ C Sl the set of skeleton nodes v for which it stores a routing pointer 
and the corresponding path weight, and let for s G Sy dg be this weight. We define wd'^ to 
be the distance function on the spanner of the skeleton graph. Node v computes dL+i =^ 
miUsesAds + wd''{s, Y^{L)) + wd(y^(L), w)}. 

• Finally, v computes d =^ ^^^i^{i,...,L+i}{di} , and determines the next routing hop in accordance 
with the corresponding path (ties broken by preferring smaller i), where we use the routing 
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mechanism from Corollary 4.5. 

Since v stores the tree routing tables for all trees on Cy^(j), the sets Hi and the distances to the 
nodes in Hi, and the complete spanner of the skeleton graph, together with the label X{w) it has the 
necessary information to perform all the above computations. Moreover, a next routing hop is always 
determined, since Yy{L) G (by Property (2) of the short-range scheme) and therefore the set of 
considered paths in the second step is non-empty. Finally, the routing decision is stateless, since it 
depends on the local routing tables of v and X{w) only. 

In order to show that indeed a route to w of bounded stretch is determined by the above routing 
decisions, we will show two properties. First, the value d computed is the weight of a path of bounded 
stretch whose next routing hop u is exactly the one computed by v, and second, the next node u on 
the path will compute a distance of at most d — wd{v,u) to w. Since edge weights are strictly positive, 
the latter immediately implies that the routes are acyclic and will eventually reach their destination. 

Lemma 4.18 Fix any choice of the parameters L and k of the short range and long distance schemes, 
respectively. For any node v and label X{w), consider the distance value d and next routing hop next 
computed by v according to the above scheme. Then w.h.p., d < (8kL — l)wd{v,w) and next will 
compute a value d' < d — W{v, next). 

Proof: We show that d < {SkL — l)wd(T;, w) first. If w G UueHi,(i) ^u{i — 1) for some i G {1, . . . , L}, 
observe that d < wd{v, Yii,{i)) + wd(yu,(i), w ), and thus by Corollary 4.7 d < (4L — 3)wd{v,w). Oth- 
erwise, we have that d = di+i since no other routes are known to v. By definition and Property (2) 
of the short-range scheme, di+i < wd{v, Yv{L)) + wd''{Yv{L),Yw{L)) + wd{Yw{L),w). We bound 

wd{v, YLiv)) + wd\Y,{L),Y^{L)) + wdiYLiw),w) 

< wd{v,YL{v)) + {2k-l)wd{YL{v),YL{w))+wd{YL{w),w) by Theorem 4.17 

< 2kwd{YL{v), v) + {2k — l)wd{v, w)) + 2kwd{w, Yl{w)) triangle inequality 

< (2/c(4L - 1) + {2k - l))wd{v, w) Lemma 4.6 
= {8kL-l)wd{v,w), 

proving that indeed d < {8kL — l)wd{v,w). 

Now let next be the routing hop corresponding to d computed by v. Due to Property (4) and 
Property (5) of the short-range scheme, there are the following three cases: 

• next is on the shortest path from v to Yuj{i — 1) G H^{i) for some i G {1, . . . ,L} (this covers also 
the case that 1^(^ — 1) = l^(i — 1), and in the tree on Cy^ the connecting path traverses 
the root Yii;{i — 1)); 

• next is on a path of weight ds to the node s G 5^ minimizing the expression ds + wd^{s, Yy^{L)) + 
wd{Y^{L),w); 

• next is on the shortest path from Y^{i) to w for some i G {1, . . . , L} (i.e., l^(i) = Yw{i), and in 
the tree on Cy^^i) the connecting path does not traverse the root Y^{i)). 

Regarding the first case, observe that since we are talking about shortest paths in G (not shortest 
/i-hop paths), any source closer to next than Yii,{i — 1) will also be closer to v than Yw{i — 1). Hence 
Yu,{i — 1) G H^ext{i). Since wd(next, ^^(i — 1)) = wd{v, Yw{i — 1)) — W{v,nex.t), consequently next 
will compute a distance of at most d — W{v, next) to w. 

In the second case, next is either the next hop on a routing path as constructed in Corollary 4.5 or 
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as constructed by the "path reversal" from Lemma 4.16. Either way, the statements show that next 
wiU know the next routing hop to s as weU as the weight of the path. Since it knows the entire skeleton 
graph, it will thus compute a distance of at most ds — W{v, next) + wd^(s, Yyj{L)) + wd(l^(L), w) to 
w as claimed. 

For the third and final case, the statement trivially holds, since routing in Cy^(j) according to the 
tree routing table is on shortest paths and will clearly lead to another node in (7y^(j). | 

It is fairly straightforward to set k and L to obtain a trade-off between the stretch of the routing 
scheme and the construction time. Specifically, we can now state our main result as follows. 

Theorem 4.19 Let 1/2 < a < 1 be given. Define k = [l/(2a- 1)] if a > 1/2 + 1/logn, and 
k *== logn otherwise. Tables for stateless routing and distance approximation with stretch p{a) = 
8k [log(A; + 1)] — 1 and label size 0([og{k + l)logn) can be constructed in the CONGEST model 
in 0{n"' + HD) rounds. In particular, p(l/2) G 0(logn log logn) and p{a) € 0{1) for any constant 
choice of a > 1/2. 

Proof: A stretch bound of 8kL — 1 and the fact that the destination will indeed be reached when 
following the computed pointers follows from Lemma 4.18. By Lemma 4.9, the running time of 
the short-range construction is bounded by d)((-y/n)^^/(^^~^'*) rounds w.h.p. The time required for 
the skeleton construction is, by Theorem 4.17, d{n/\SL\^-'^/'' + |5'l|1+i/'= + HD) = 6{{^/^y+^/'' + 
HD) w.h.p. To match the desired running time bound of O^n" + HD) rounds, it thus suffices that 
max{l/A;, 1/(2^ — 1)} > 2a — 1 — 1/logn (an additive 1/logn in the exponent maps to a constant 
factor). By choice of k, this inequality holds for L =^ [log(A; + 1)]. The stretch is thus bounded by 



p{a) + l<8kL 



8[l/(2a - 1)1 [log[l/(2a - 1)] + 1] for a > 1/2 
8 logn [log log n + 1] for a = 1/2. 



The bound on the label size follows from Lemma 4.9, our choice of L, and the fact that the 
long-distance scheme adds only O(logn) bits to the label. | 

The space complexity of our scheme, i.e., the number of bits of the computed routing tables, is 
also straightforward to bound. 

Corollary 4.20 The size of the routing table at node v computed by the algorithm referenced in 
Theorem 4-19 is 0{n"). 

Proof: Observe that the dominant terms in memory consumption are (i) storing the sets Hv{i) and 
the next pointers to them for the short-range routing scheme, (ii) storing the routing information for 
the paths from the roots u £ Si of the trees induced by the sets Cu{i), and (iii) storing Gs^^h{SL) 
and the next pointers for the long-range scheme. Trivially, the encoding of Gg^^hiSi,) cannot require 
more than 0{n") memory, as it is broadcasted globally over the BFS tree. The routing information 
from Cu{i) to the nodes in its tree is log*^^^-* n bits [29]. The term from (i) originates from calls to 
Algorithm BSP. The routing information that needs to be stored consists of the history of the list 
maintained by Algorithm BSP. Hence, if such a call has depth and overlap parameters h and A, the 
memory required is 0(/iAlogn). Hence the memory bound for (i) directly follows from the running 
time bound from Lemma 4.9. | 
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5 Extensions and Applications 



5.1 Distance Sketches 

The problem of distributed distance sketches requires each node to have a label and store a small 
amount of information (called the sketch), so that each node v can estimate the distance to each 
other node u when given the label of u.^ Technically speaking, we already solved this problem, since 
our machinery enables to estimate distances with small stretch. However, since a > 1/2, the basic 
construction will always consume Q{^/n) memory. 

If we discard the routing information, we can reduce the space requirements of the sketches at the 
expense of also increasing the stretch. To this end, we need to reduce the maximal size of the sets Hv[i) 
as well as the space consumed for storing information on the skeleton graph. Our idea is as follows. 
First, we change the sampling probabilities of the sets Si so that S 0(n^~*/^), where i ranges from 
to L. With this choice, the expected number of nodes from Si that are closer to a given node than 
the closest node from ^j+i is 0(n-^/^), implying that E[|i/^,(i)|] G 0(n^/^) for all i. Second, we do not 
choose Sl as skeleton set, but rather Si^ for so that the skeleton can be constructed quickly. 

To continue applying the short-range scheme beyond stage iq without increasing the asymptotic time 
complexity of the construction, we construct temporary distance sketches and labels for the skeleton 
(using the long-range scheme with skeleton set Si^^), which allows nodes to estimate their distance to 
skeleton nodes locally. Ensuring that the Si are subsets of the skeleton for i > io, each node thus can 
simulate the short-range algorithm's detection of the sets Hi and the respective distance computation 
locally, based on its estimated distance to skeleton nodes; the price we pay is increasing the stretch 
by factor 0{L) due to imprecise distances. 

Theorem 5.1 Given any integer k £ [1, . . • ,logn], distance sketches with stretch p{k) = 2k{8k — 3) S 
0{k'^), label size 0{klogn) , and sketch size can be constructed w.h.p. in the CONGEST 

model in 0{v}l'^+^/('^^) + HD) rounds. 

Proof: We use the following algorithm, parametrized by k. 

1. Run the short-range with k stages and expected set sizes E[|5,|] = n*/(2'=) for i G {0,...,fe}. 

2. Run the long-range scheme on the skeleton S^- 

3. For k+1 < i < 2k — 1, sample set Si C Si-i, where each node is picked uniformly with probability 
j^-i/(2fc) each step. Each node in Sk broadcasts its membership information. 

4. For each pair v £ V and s G Sk, set wd'{v, s) '= wd('t;, Yv{L)) + wd(l^(L), s). For A; -|- 1 < i < 
2k — 1, compute at each node v the closest node Y^^i) S Si w.r.t. wd', and the set H^{i) := {s E 

I wd'(t;,s) < wd'(f, y^i))}. Set Hy{2k) = S2k-i- 

5. Store at each node v, for each i G {!,..., 2k}: (i) the set H^{i); (ii) for each u G H^{i), the value 
wd{v,u) ii i < k, or wd'{v,u) ii i > k. Label node v by X{v) = (yu(^), d(i))ie{o,...,2fc-i}; where 
d{i) =^ wd{v, Y,u{i)) ii i < k, and d{i) =^ wd'(t', Yv{i)) otherwise. 

Given label X{w) (which is clearly of size 0{klogn)), node v estimates the distance to w by finding 

®The formulation in [6] permits to use both sketches to approximate the distance. However, from the distributed point 
of view it is more appropriate to assume that only a minimal amount of information is exchanged. 
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the smallest i £ {1, • • • ,2k} so that Yuj{i — 1) € Hy{i) and adding the respective distance estimates 
from V to Yiu{i — 1) (locally known) and from Yw{i — 1) to w (from the label). Such an i always exists 
because Hy{2k) = S2k-i- This completes the description of the algorithm. 

By Corollary 4.7, the stretch of the routes represented by the labels and sketches would be 8/c — 3 G 
0{k) w.h.p. if all distances were exact. The approximation ratio is obtained by multiplying this value 
by the maximal stretch of any distance estimates employed in the construction. Up to stage k, all 
values are exact w.h.p. Thereafter, we use estimates of distances between skeleton nodes and all other 
nodes. By the triangle inequality, we have for all v €V and s £ S that 

wd{v,s) < wd{v,Yy{io)) + wd(y;;(io), s) < wd{v,Y^{io)) + wd''(i;,(io), s) = wd{v,s). 

On the other hand, 

wd{v, l^(io)) + wd'^(l^(io), s) < wd{v, s) + wd'' {Yy{iQ) , s) by definition of Yy{iQ) 

< 2k ■ wd(Y^{io),s) by Theorem 4.17. 

Hence the stretch of the distance estimates is bounded by p(k) = 2k(Sk — 3) G 0{k'^). 

By Chernoff's bound, for all i we have that l^il G 0(n'^^'^~*^/(^^^) w.h.p. Hence, the non-local 
part of the construction can be performed with overlap parameter Aj G 0(n^/(^'^^ logn) and distance 
parameter hi G 0(n*/*^^'^)) for all i G {1, . . . , A;}. We conclude the claimed running time and memory 
bounds of 0(n^/^"'""'^/*^^'^^) (time 0(/ijAj) for each step of the short-range scheme, time 0{n/\Sk\^~^/^ + 
I'S'fcl^^^^*^ + HD) for the long-range scheme, and time 0(15^1 + HD) for the additional broadcast step) 
and 0(n-'^/(^'^)), respectively, completing the proof. | 

We note that a distributed implementation of Thorup-Zwick distance oracles with stretch 2k — 1 
and running time e(SPD-ni/*^) was recently given by Das Sarma et al. [6]. Intuitively, the reason for 
the discrepancy is that in [6] , there is no use of the skeleton graph. In general, our running time and 
the one from [6] are incomparable (one may run both algorithms in parallel and use the output of the 
one that terminates first). 

5.2 Approximate Weighted Diameter 

Obtaining an approximation of the weighted diameter is simpler than constructing distance sketches. 
Dropping the short-range scheme from the construction, we can prove the following result. 

Theorem 5.2 For any A; G N, the weighted diameter WD can he approximated w.h.p. to within a 
factor of 2k + I in the CONGEST model in 0(nV2+i/(2fc) _^ yID) rounds. 

Proof: We use the following streamlined version of our algorithm. 

1. Select a uniformly random skeleton S where Y't[v G S"] = l/v^ independently for allv^V. 

2. Apply Algorithm 3 to construct a {2k — l)-spanner of Gs^h(SR)- Let WD*^ be the weighted 
diameter of the spanner of Gs^h{SR) (which can be computed locally). 

3. Apply Algorithm BSP, where all nodes in S function as the same source: source(f ) = 1 for all 
V £ S and source(f) = _L for all v ^ S. Use A = 1 and h G Q{^/nlogn). 

4. Find the maximal distance dmax computed by any node and output 2dmax + WD'^. 
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Regarding the time complexity, note that Step 2 requires 

0(^i/2+i/(2fe)^jjQ) rounds by Theorem 4.17, 
Step 3 requires 0{^/n) time by Theorem 4.4, and Step 4 takes ©(HD) rounds. 

Regarding the approximation ratio, consider any v,w £ V, and let s^, G S* the nodes in S closest 
to V and w, respectively. Then 

wd{v, w) < wd{v, Sy) + wd(st,, Sy,) + wd(s^, w) < dmax + WD''' + dmax , 

and hence WD < WD'' + 2d max- On the other hand, we have 

WD > max|(imax,max{W5^,j(5^)(s,t)}| 

^ 2dmax + {2k - l)max,^tGs{WsxsR.)is,t)} ^ 2d^,^ + WD^ 
2k + I - 2/c + l ' 

where the last inequality holds w.h.p. by Theorem 4.17. | 

5.3 Distributed Approximation for Generalized Steiner Forest 

In this section we explain how to utilize our routing scheme to obtain a fast distributed algorithm for 
the Generalized Steiner Forest problem (gsf), defined as follows. 

Generalized Steiner Forest (gsf') 

Input: A weighted graph G = {V^ E, W), a set of terminals T C y, and for each terminal 
t £ T a component number C{t). 

Output: A subset of the edges F <^ E such that for all pairs s,t €T with C(s) = C{t), 
we have that s is connected to t in the graph {V, F). 
Goal: Minimize X^eeF ^(^)- 

We note that sometimes, the connectivity requirement C is expressed as a set of node pairs. While 
the size of the input representation may differ, the two variants are equivalent for our purposes; using 
our spanner construction, we can obtain the component-based description within 0{T + HD) rounds 
from the pair-based formulation. 

In the distributed setting, we assume that each node knows whether it is a terminal, and if so, 
what is its component number. Clearly, we can establish global knowledge on T and the component 
numbers within time 0{T + HD) by broadcasting the respective pairs of values over a BFS tree. 

We now present a solution to GSF. We start with a generic reduction to a centralized algorithm 
which abstracts away the underlying graph G, and uses a graph whose nodes are just the terminals 
and edge weights are inter-terminal distance estimates. 

To analyze Algorithm 5, we consider two simple transformations of the input instance and state 
their effect on the cost of the solution. First, consider the effect of using just distances between 
terminals (and not the whole graph). The following lemma bounds the effect of this simplification. 
Given an instance X for GSF, we use OPT(X) to denote any fixed optimal solution for X. 
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Algorithm 5: Distributed algorithm for GSF. ALG is any centralized approximation algorithm 
for GSf'. 

input : terminal components / / locally known: v knows whether v £T and if so, its component number 

output: F: edges in the Steiner forest 

1 Obtain distance estimates for all distances wd(s,i) with s^t €T. 

2 Simulate ALG on the graph G' — {T, E' , W') and the same terminal components, where 

E' {{s, t}\s,t eT As ^t}, and for all s,teT, W'{s, t) := niin{wds(t), wdt(s)} and wds{t) denotes 
the estimate of wd(s,i) computed at s (given the label of t). Denote by F' the computed solution. 

3 Identify and output all edges on paths in G that correspond to edges in F' . 



Lemma 5.3 Letl = {G = {V, E,W),T,C) be an instance of GSF . Define an instance I' = {G' ,T,C), 
where G' = {T,E',W') with E' = {{s,t} \s,teT} and W'{s,t) = wdG{s,t). Then W'{OFT{I')) < 
2VF(0PT(X)). 

Proof: The proof is a generalization of the standard argument for Steiner trees [28]. Let F = 
OPT(X). Let Ci = (Vi,Fi), . . . ,Cm = iVm,Em) be the connected components of F. By optimality 
of F, the components are trees. Fix a component Cj, and consider an Euler tour of its tree. Let 
a = (^vq, . . . , V2\Vj\ = t^o^ be the sequence of nodes visited by the tour. Since each edge in Fj is visited 
exactly twice in a, we have that the total weight of edges in the tour is 2W{Fj). Define the node 
sequence a' = (uq, . . . ,U|y^|„i^ obtained from a by omitting the second occurrences of nodes from 
a. Consider now the set of edges Fj := {{uj_i,nj} | < i < \Vj\} in G' . Since the edges in G' are 
shortest paths in G, clearly the weight of Fj is not more than the total weight of edges in a, namely 
W'{Fj) < 2W{Fj). Finally, note that F' = Uj=i F- is a feasible solution for I', and therefore 

t t 

W'{0FT{I')) < W'{F') = ^W'{Fj) < ^2W{Fj) = 2M/(0PT(X)) . 

j=i j=i 

I 

Next, consider replacing edge weights by p-approximate weights. 

Lemma 5.4 Letl = {G = {V, E,W),T,C) and I' = {G = {V, E,W'),T,C) be instances of GSF 
differing only in the edge weights as follows: for all e G F, W{e) < W'{e) < pW{e) for some p > 1. 
Then W{OPT{I')) < pW{OFT{I)). 

Proof: W{OPT{I')) < W'{OPT{I')) < W'{OPT{I)) < pW{OPT{l)). | 

The preceding two lemmas show that using /o-approximate distances and a centralized a-approxima- 
tion algorithm for GSF, we will obtain a distributed (2/9a)- approximation algorithm for GSF. 

It remains to show how to efficiently implement Algorithm 5 in the CONGEST model. The key 
is Step 1: Steps 2 and 3 will be performed locally at each node. 

Corollary 5.5 For any integer k £ [l,logn], Algorithm 5 can be executed in the CONGEST model 
in 0{{y/n+ \T\)^~^^/'' + HD) rounds with stretch factor p(k) = 2k — 1. 

Proof: We apply the long-range routing scheme with skeleton set S := TU Rs, where Rs is sampled 
uniformly and independently at random with probability n~^/^ from V. Lemma 4.15 implies that we 
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can perform Step 1 of Algorithm 5 within 0(n|S'|^/'=/|5/j| + + HD) = + |r|)i+i/'= + HD) 

rounds with stretch p(k) = 2k — 1. Moreover, at the end of this step, aU nodes know the spanner of 
the skeleton graph and can therefore locally compute W. As remarked earlier, all nodes can learn the 
terminal components within 0(-^/n + HD) C 0{{^/n+\T\)^'^^^'' +ilD) rounds. With this information in 
place, all nodes can locally simulate ALG on G' and thus perform Step 2 of the algorithm. According 
to Lemma 4.16, the nodes on paths in G corresponding to edges in G' can learn of their membership 
within 0(n|S|"^/'^/|5ij| + |5|"'^^"'^/'^ + HD) rounds as well. Afterwards, Step 3 of the algorithm can be 
completed locally as well. Summing up the running time bounds for the individual steps, we conclude 
that the overall time complexity is 0{{^/n + |T|)^"''^/'^ + HD) as claimed. | 

Altogether, we arrive at the following result. 

Theorem 5.6 Given any integer k G [l,logn] and any centralized a- approximation algorithm to GSF, 
GSF can be solved in the CONGEST model with approximation ratio 2a(2fc-l) in 0((-v/n+|r|)^+^/^ + 
HD) rounds, where T denotes the set of terminal nodes. 

Proof: Corollary 5.5 proves that Algorithm 5 can be implemented in 0{{^/n+\T\y'^^^^ + HD) rounds, 
with distance estimates of stretch p{k) = 2k — 1. The approximation guarantee therefore follows from 
Lemmas 5.3 and 5.4. | 

We note that one can implement Step 1 also by computing distance sketches as in Theorem 4.19 
without including the entire terminal set into the skeleton (i.e., G' does not become global knowledge), 
and simulate ALG sequentially, taking ©(HD) per step. This will reduce the overall running time in 
case \T\ 3> y/n and HD times the step complexity of ALG (in terms of the number of globally 
synchronized steps) is small compared to ITI^^^/'^. However, this approach has two drawbacks. First, 
ALG cannot be arbitrary, but must admit to be simulated via a BFS tree using small messages. 
Second, the approximation ratio deteriorates according to the number of stages of the short-range 
scheme, as the number of stages contributes as a multiplicative factor p. 

Discussion. It is known that the special case of MST, where all nodes are terminals in a single 
component has worst-case running time of ^l{^/n) even if the hop-diameter is O(logn) [23]. However, 
it is unclear whether this lower bound holds if the number of terminals is small, and in turn, whether 
a larger number of terminal components makes the problem harder. For instance, for a single pair of 
terminals the problem reduces to selecting a single approximate shortest path; we are not aware of 
any non-trivial lower bound on this problem. Khan et al. [15] provide a C'(logn)-approximation to 
GSF within C(SPD • 7) rounds, where 7 denotes the number of terminal components. The algorithm 
from [15] matches this bound up to factor log'^'^^^ n if SPD • 7 E 0{^/n)■, our approach does so in case 
t G 0{^/n), where t is the number of terminal nodes. Note that the two running time bounds in general 
are incomparable: for approximation ratio O(logn), we achieve time complexity 0{^/n+t+HD)■, there 
are instances for which SPD • 7 ^ y/n -|- t as well as those where ^/n + t ^ SPD • 7. However, our 
approach is superior in that we can, for any integer k G [l,logn], ensure an approximation ratio of 
0{k) at the expense of a slightly larger running time of ^{{^/n + t^+^Z'' + HD) rounds. The authors 
of [15] employ probabilistic tree embeddings, a technique for which an approximation ratio of Q(logn) 
is inherent [10]. 
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5.4 Tight Labels 



The presented routing scheme relabels the nodes according to the Voronoi partition on each level. This 
yields suboptimal size of labels and makes it impossible for nodes to learn all labels X{V) quickly. We 
now present a modification of our routing scheme with labels X{V) = {1, . . . ,n}, trading in a larger 
stretch. 

Instead of labeling the nodes on each level of the hierarchy independently, we do this by an inductive 
construction. 

1. Define the partial order -<onV given by v ^ u if (and only if) one of the following is true: 

• v,u G Sl and the identifier of v is smaller than the identifier of u. 

• Iv = lu < L, Yy{ly + 1) = Yu{lv + 1), and v precedes n in a fixed DFS enumeration of the 
tree 7V„(it,+i)(^i) + 1) on Cy^(i^-\-i){Iv + 1) induced by the shortest /i^^-hop paths from each 
w £ Cy^(i^+i){Iv + 1) to Y^{ly + 1) detected by the invocation of Algorithm BSP in stage ly. 

2. Set countt,(0) := 1 for a\l v £ V = Sq. For each level i £ {1, . . . , L}, aggregate the sums of the 
values counts(i — 1) of nodes s G Si-i in subtrees of TY^(i){i) at the roots of these subtrees. We 
define for all s G Si the value 

counts (i) := counts/(i — 1), 

s=Y^,{i) 

which can be computed from the received values. For each level i, this operation can be performed 
within 0{hi) rounds. 

3. Each skeleton node s £ Sl announces counts(L) to all other nodes. This requires 0(15x1 + HD) 
rounds using a BFS tree. 

4. Each skeleton node s £ Sl sets 

A(s) := 1 + counts'(i). 

s'^s 

5. Starting from level L and proceeding inductively on decreasing i, for each i £ {1, . . . , L}, each 
node s £ Si, and each node s' £ Cs{i) H S'j_i, we inform s' of the value 

A(s') := A(s) + 1 + ^ counts"(i - 1). 
s"eC4i)n5,_i 

s"<s' 

Note that this step can be performed in 0{hi) rounds once As is known due to the information 
collected in Step 2. 

From the above arguments and the results from Section 4.2 we can immediately conclude that the 
time complexity of computing these labels is negligible. 

Corollary 5.7 Executing the above construction does not increase the asymptotic time complexity of 
setting up the routing tables. 

By construction, X{V) = {1, . . . , n}. Note that at the end of the above construction, each node 
V knows A(f) and, for each level i and each of its children in Ty^(j)(i), it knows the range of labels 
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associated with this child. For each v £ V, set vq := v and define inductively for i £ {1, . . . ,L} that 
Vi = Y^^_-^^{i). Given any label X{w), we can thus route from any node f to t/; as follows. 

1. Set i := 0. 

2. If i < L and Wi ^ H^.{i + 1), then route to fi+i, set i := i + 1, and repeat this step. If i < L 
and Wi £ Hy-{i + 1), route to wi and proceed to the next step. If i = L, route to wl using the 
long-range scheme and proceed to the next step. 

3. If i > 0, then route to Wj-i, set z := i — 1, and repeat this step. Otherwise wi = w and we are 
done. 

The constructed sequence of routing indirections is thus (uq, . . . , Vig^Wi^, . . . ,w), where either io is the 
minimal level such that Wig G -ff^.^ {io + 1) ov io = L. As a result of these indirections, we cannot give 
a bound on the stretch that is linear in the number of levels anymore. However, we still can argue 
that wd{vi,Wi) < 4wd(fi_i, Wj-i) assuming that Wi-i Hy-_^{i). 

Lemma 5.8 Suppose that for the labeling scheme stated above we have for some integer 1 < io < L 
that Wi-i Hy-{i) for all integers < i < io- Then 

io 

wd(i;i(,, Wio) + ^(wd(fi_i, Wj) + wd{wi,uJi-i)) < 2 • 4*°wd(w, ly). 

i=l 

Proof: We show by induction that wd{vi,Wi) < 4*wd(u,ii') for all < z < io, which is obviously 
true for i = 0. Analogously to Lemma 4.6 we have that wd(fi, Vj+i) < wd{vi,vji) and wd{wi,Wi^i) < 
2wd{vi,Wi). By the triangle inequality, 

wd{vi+i,Wi+i) < wd{vi+i,Vi) + wd{vi,Wi) + wd{wi,Wi+i) < 4:wd{vi,Wi) = 4*+Vd(i;, it;). 

This completes the induction and in addition reveals that 

io «o-l «o— 1 

'^{wd{vi-i,Vi) +wd{wi,Wi^i)) < 3 ^ wd{vi,Wi) < 3 ^ 4Vd(t;,if) < 4 • 4*°" Vd(t!, w). 

1=1 i=0 i=0 

Therefore 

«o 

wd{vif,,WiJ + '^{wd{vi-i,Vi) +wd{wi,Wi-i)) < 2 ■ 4''°wd{v,u)), 
1=1 

concluding the proof. | 

As by Corollary 5.7 the construction time of the routing scheme is not affected by the above labeling 
and routing mechanism and Lemma 5.8 provides a stretch bound of 2 • 2^^ for the modified short-range 
routes, we obtain the following statement. 

Theorem 5.9 Given a £ [1/2,1], let k = \l/{2a - 1)] if a > 1/2 + 1/logn and k = [1/lognJ 
otherwise. Tables for stateless routing and distance approximation with stretch p{a) = 4A: • 41^'°^^^"*""^^^ + 
2A; — 1 G 0{k^) with node labels 1, . . . , n can be constructed in the CONGEST model in 0{n^ + HD) 
rounds. 
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